<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ryosuke Tsuji</title>
    <description>The latest articles on Forem by Ryosuke Tsuji (@ryantsuji).</description>
    <link>https://forem.com/ryantsuji</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843591%2F8b126f91-f561-4e6b-8492-814b18d680ec.jpg</url>
      <title>Forem: Ryosuke Tsuji</title>
      <link>https://forem.com/ryantsuji</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ryantsuji"/>
    <language>en</language>
    <item>
      <title>Human-on-the-Loop: AI Reviewing AI PRs at cortex (769 PRs/month, while raising the quality bar)</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Tue, 26 May 2026 14:35:43 +0000</pubDate>
      <link>https://forem.com/ryantsuji/human-on-the-loop-ai-reviewing-ai-prs-at-cortex-769-prsmonth-while-raising-the-quality-bar-4lh5</link>
      <guid>https://forem.com/ryantsuji/human-on-the-loop-ai-reviewing-ai-prs-at-cortex-769-prsmonth-while-raising-the-quality-bar-4lh5</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: "cortex" in this article is the internal codename for an AI platform built in-house at airCloset. It is unrelated to existing commercial services like Snowflake Cortex or Palo Alto Networks Cortex.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In &lt;a href="https://dev.to/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa"&gt;Part 1 (intro)&lt;/a&gt; I covered the high level -- &lt;strong&gt;AI driving both PR reviews and incident response on top of cortex&lt;/strong&gt;. In &lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Part 2 (Product Graph)&lt;/a&gt; I went deep on &lt;strong&gt;cpg&lt;/strong&gt;, the unified knowledge graph that fuses code, docs, DB schemas and infra into a single business-aware index.&lt;/p&gt;

&lt;p&gt;This post is about &lt;strong&gt;the automated PR review pipeline&lt;/strong&gt; -- AI reviews the PR, a separate AI applies the fixes, and the system merges automatically once policy gates pass. The usual critiques of AI-assisted development ("&lt;strong&gt;the reviewer becomes the bottleneck&lt;/strong&gt;" and "&lt;strong&gt;AI code drops the quality bar&lt;/strong&gt;") don't really apply here. The rest of this post unpacks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Series
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Key scene&lt;/th&gt;
&lt;th&gt;Article&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Series intro: cortex harness&lt;/td&gt;
&lt;td&gt;PRs merging unattended / incidents fixed before anyone notices&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa"&gt;ai-harness-intro&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Product Graph (cpg)&lt;/td&gt;
&lt;td&gt;Code / docs / DB / infra unified into one graph&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;cortex-product-graph&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Auto PR review&lt;/td&gt;
&lt;td&gt;webhook -&amp;gt; AI review -&amp;gt; auto-fix -&amp;gt; squash merge&lt;/td&gt;
&lt;td&gt;This article ← you are here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Alert-Fix + observability + auto-added guardrails&lt;/td&gt;
&lt;td&gt;Alert -&amp;gt; AI investigates -&amp;gt; fix PR + new lint/type gate -&amp;gt; auto redeploy + recurrence blocked&lt;/td&gt;
&lt;td&gt;Coming soon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Scaling the harness from cortex to toC services&lt;/td&gt;
&lt;td&gt;Non-engineer contributions in practice + scaling cortex's harness to the whole product org&lt;/td&gt;
&lt;td&gt;Coming soon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Start with last month's numbers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;769 PRs merged.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Median time to merge: 31 minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human review involvement per PR: near-zero.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a typical 30 days on cortex (Apr 21 -- May 21).&lt;/p&gt;

&lt;p&gt;Every one of those 769 PRs had an AI reviewer as the first reviewer, with &lt;strong&gt;an average of 10.8 review-fix loop iterations per PR (max 56)&lt;/strong&gt;. 1 in 5 merged within 10 minutes, roughly half within 30 minutes. What humans do now is look at review outcomes and &lt;strong&gt;tune the review prompt and the guidelines themselves&lt;/strong&gt; -- this is &lt;strong&gt;human-on-the-loop, not human-in-the-loop&lt;/strong&gt;. &lt;strong&gt;Humans operate on the policy layer, not the execution layer.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Past 30 days&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PRs merged&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;769&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI reviewer coverage&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg review iterations / PR&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10.8&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max review iterations&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-PR human review&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median time-to-merge&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;31 min&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merged within 10 min&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merged within 30 min&lt;/td&gt;
&lt;td&gt;49%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a typical month on cortex now.&lt;/p&gt;

&lt;p&gt;The common refrain -- "&lt;strong&gt;AI speeds up writing but reviews still bottleneck&lt;/strong&gt;" and "&lt;strong&gt;AI-written code lowers quality&lt;/strong&gt;" -- is something cortex absorbs through &lt;strong&gt;a pipeline where neither failure mode can take hold&lt;/strong&gt;. Let me break it down.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the review bottleneck stops forming
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The conventional wisdom: the reviewer becomes the bottleneck
&lt;/h3&gt;

&lt;p&gt;As AI writes faster, the load on whoever reviews the output grows proportionally. Anthropic's internal blog (&lt;a href="https://www.anthropic.com/news/how-anthropic-teams-use-claude-code" rel="noopener noreferrer"&gt;How Anthropic teams use Claude Code&lt;/a&gt;) reports the same pattern -- &lt;strong&gt;the bottleneck has shifted from writing to reviewing&lt;/strong&gt;, and senior engineers' work has moved from writing code toward integrating and reviewing AI output.&lt;/p&gt;

&lt;p&gt;cortex hit exactly this. The moment we ran Claude Code at full throttle, &lt;strong&gt;writing speed jumped by an order of magnitude or more&lt;/strong&gt;. Meanwhile the human time available to read and approve PRs only grew linearly. If the reviewer (=me) took a day off, the whole org stalled -- a classic single point of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  cortex's answer: move the reviewer role to AI as well
&lt;/h3&gt;

&lt;p&gt;Part 1 and Part 2 kept asking the same recurring question: "&lt;strong&gt;how far do you push the harness?&lt;/strong&gt;" cortex went all-in: &lt;strong&gt;the AI writes the code, the AI reviews the code&lt;/strong&gt;. What humans keep their hands on is "&lt;strong&gt;tuning the prompts and guidelines themselves&lt;/strong&gt;" -- not making decisions inside each individual PR, but watching the system from above and adjusting.&lt;/p&gt;

&lt;p&gt;Three conditions had to hold for this to work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The AI reviewer has enough context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A generic AI reviewer &lt;strong&gt;only sees the PR diff&lt;/strong&gt;. The diff alone hides business meaning, upstream/downstream dependencies, and prior incident history. cortex feeds the &lt;strong&gt;Product Graph (cpg)&lt;/strong&gt; from &lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Part 2&lt;/a&gt; -- &lt;strong&gt;a knowledge graph that fuses code, docs, DB schemas, and infra into one structure, with each node carrying business role and upstream/downstream dependencies&lt;/strong&gt; -- into the AI reviewer, so it can &lt;strong&gt;trace impact into code that the PR didn't even touch&lt;/strong&gt;. It catches:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Missed upstream/downstream fixes
- Missed doc updates
- Tests that should have been updated but weren't

Diff-only AI review can never reach this territory.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reviews are not improvisational&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If reviews shift day to day, the team gets confused, and the AI can't be told what "correct" looks like. We enforce this by passing &lt;strong&gt;an explicit review-guideline document&lt;/strong&gt; as the mandatory citation source for every review (we open-sourced a snapshot, see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;False positives don't blanket-block merges&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Treating every false positive as Critical breaks the workflow. We control this with &lt;strong&gt;a severity hierarchy (Critical / Major / Minor / Nit) plus strict no-downgrade rules&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So: the cpg from &lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Part 2&lt;/a&gt; solves "&lt;strong&gt;what context the AI sees&lt;/strong&gt;," the review guidelines solve "&lt;strong&gt;what the AI should do&lt;/strong&gt;" as &lt;strong&gt;Guides (pre-execution control)&lt;/strong&gt;, and the severity ladder + no-downgrade rules solve "&lt;strong&gt;what the AI must not do&lt;/strong&gt;" as &lt;strong&gt;Sensors (post-execution control)&lt;/strong&gt;. This maps cleanly onto Martin Fowler's Guides / Sensors taxonomy (introduced back in &lt;a href="https://dev.to/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa"&gt;Part 1&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;One more upstream layer: before any of those three kicks in, &lt;strong&gt;a 500-lines-per-file lint&lt;/strong&gt; keeps every file in any PR small enough to fit in a single AI session. That alone keeps AI review from breaking down, and unlike a human reviewer, the AI doesn't lose focus. There are plenty of other lints in front of the AI reviewer too, but the full picture belongs to &lt;strong&gt;Part 4 (Alert-Fix + observability + auto-added guardrails)&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the auto-review system is wired
&lt;/h2&gt;

&lt;p&gt;The implementation is &lt;strong&gt;a script running on each developer's machine&lt;/strong&gt;. GitHub webhooks land on an in-house &lt;strong&gt;Event Relay server&lt;/strong&gt;, get persisted to Firestore, and each developer's machine subscribes as an SSE client. On reconnect, Last-Event-ID replays anything missed -- zero event loss, single webhook registration. &lt;strong&gt;Reviewer-mode machines stay always-on&lt;/strong&gt;, so any incoming review fires immediately. &lt;strong&gt;Author mode runs in the background on the PR author's own machine&lt;/strong&gt;, alongside their normal dev work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we ended up with Event Relay
&lt;/h3&gt;

&lt;p&gt;The current setup wasn't the original design.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First&lt;/strong&gt;: GitHub webhook → &lt;a href="https://smee.io/" rel="noopener noreferrer"&gt;smee.io&lt;/a&gt; → each machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Then&lt;/strong&gt;: GitHub webhook → Cloudflare Tunnel → each machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Now&lt;/strong&gt;: GitHub webhook → in-house &lt;strong&gt;Event Relay&lt;/strong&gt; with Firestore persistence → SSE to each machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both smee.io and Cloudflare Tunnel ran into &lt;strong&gt;connection drops and missed deliveries&lt;/strong&gt;, which caused real misses for us. Switching to the in-house Event Relay brought event loss to zero (&lt;strong&gt;Firestore persistence + Last-Event-ID replay&lt;/strong&gt;), and the relay turned into a general-purpose layer we could reuse.&lt;/p&gt;

&lt;p&gt;The webhook ingestion for &lt;strong&gt;Alert-Fix&lt;/strong&gt; (covered in Part 4) actually goes through &lt;strong&gt;the exact same Event Relay&lt;/strong&gt;. GitHub, Grafana, and other webhook sources get consolidated through one relay, and each machine's SSE client subscribes to whichever events it cares about. &lt;strong&gt;Having a single general-purpose webhook relay is a piece of infra that keeps paying off in unexpected ways&lt;/strong&gt; -- worth investing in early.&lt;/p&gt;

&lt;p&gt;When the reviewer's machine receives an event, the script spawns &lt;code&gt;claude -p&lt;/code&gt; and walks through 9 dimensions (Graph / Architecture / Security / Test / Doc / Impact / Observability / AI-Antipattern / Recurrence) sequentially, then reads the verdict marker the AI emitted at the end and posts &lt;code&gt;APPROVE&lt;/code&gt; or &lt;code&gt;REQUEST_CHANGES&lt;/code&gt; via &lt;code&gt;gh pr review&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faljjylxg1llcgeau6r4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faljjylxg1llcgeau6r4v.png" alt="Auto review pipeline — distributed webhook architecture running on every developer's machine" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modes split the role&lt;/strong&gt; -- the same script started with &lt;code&gt;--mode reviewer&lt;/code&gt; becomes the reviewer process; with &lt;code&gt;--mode author&lt;/code&gt; it becomes the PR-author response process. The machine of whoever is assigned as reviewer runs reviewer mode; the machine of whoever opened the PR runs author mode. Event Relay multicasts the events, and &lt;strong&gt;each machine reacts in a distributed way&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-PR worktree isolation&lt;/strong&gt; -- author mode merges &lt;code&gt;origin/main&lt;/code&gt; into a fresh worktree before spawning the AI. Multiple PRs can be handled in parallel without file state contaminating across them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9 dimensions checked sequentially in one session&lt;/strong&gt; -- not parallel sub-agents. A single &lt;code&gt;claude -p&lt;/code&gt; session walks the 9 dimensions while keeping context shared, which also catches cross-dimension contradictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review guidelines: public snapshot&lt;/strong&gt; -- &lt;a href="https://github.com/air-closet/cortex-review-guidelines" rel="noopener noreferrer"&gt;air-closet/cortex-review-guidelines&lt;/a&gt; (JP/EN). The live guidelines are inside cortex (private repo) and evolve daily; the public repo is a snapshot extracted for reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;:::message alert&lt;br&gt;
&lt;strong&gt;Guidelines alone scale only to projects in the tens-of-thousands-of-lines range.&lt;/strong&gt; At cortex's scale (&lt;strong&gt;over 1M lines of code&lt;/strong&gt;), the &lt;strong&gt;knowledge graph from &lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Part 2&lt;/a&gt; (cpg) is a hard prerequisite&lt;/strong&gt;. Porting the guidelines without cpg won't reproduce the same review quality -- the AI reviewer simply can't navigate the codebase fast enough to reason about impact.&lt;br&gt;
:::&lt;/p&gt;
&lt;h3&gt;
  
  
  Why sequential single-session review, not parallel sub-agents
&lt;/h3&gt;

&lt;p&gt;We initially tried splitting the 9 dimensions across parallel sub-agents. Three problems emerged: cpg / guidelines / PR diff got injected 9 times (token cost balloons), cross-dimension findings couldn't reference each other (a &lt;code&gt;[Test]&lt;/code&gt; issue rooted in a &lt;code&gt;[Graph]&lt;/code&gt; violation gets dropped in isolation), and aggregating 9 outputs into a single verdict required its own machinery.&lt;/p&gt;

&lt;p&gt;A single sequential session fixes all three: one cpg/guideline load, earlier findings stay in context for later dimensions (cross-dimension consistency comes for free), and one verdict marker at the end is the entire aggregation step.&lt;/p&gt;

&lt;p&gt;We also &lt;strong&gt;swap &lt;code&gt;CLAUDE.md&lt;/code&gt; to a review-specific version&lt;/strong&gt; at startup. The default &lt;code&gt;CLAUDE.md&lt;/code&gt; is dense with development-time context (Product Graph ops, prod-data safety, MCP ordering) -- noise for a reviewer. The review-specific version centers on severity, no-downgrade, and the verdict marker spec, keeping AI attention on the review task.&lt;/p&gt;

&lt;p&gt;Cutting wasted context lifts judgment precision and token cost at the same time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Operational knobs
&lt;/h3&gt;

&lt;p&gt;A few filters and toggles we apply in actual use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Draft (WIP) PRs are excluded.&lt;/strong&gt; GitHub Draft state is received but skipped; review starts firing once the author flips it to Ready for Review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific PRs can be targeted manually.&lt;/strong&gt; The webhook is the normal trigger, but you can also kick off a review against a specific PR number from the CLI -- useful after a CI failure or for re-checking a single PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-merge is the PR author's call.&lt;/strong&gt; Whether the pipeline runs through to auto-merge after APPROVE + CI green is set by the PR author. Default is on; for changes that go directly to prod, the author can flip it off and hit merge themselves.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Output structure: tags and severity
&lt;/h2&gt;

&lt;p&gt;Every auto-review comment is structured as &lt;strong&gt;tag + severity + concrete example&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tags (dimensions)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tag&lt;/th&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Primary target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Graph]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Product Graph integrity&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@graph-*&lt;/code&gt; JSDoc, node dependencies, doc consistency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Doc]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Doc consistency&lt;/td&gt;
&lt;td&gt;Doc updates that should follow code changes, doc placement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Impact]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Impact analysis&lt;/td&gt;
&lt;td&gt;Missed upstream/downstream fixes, &lt;code&gt;via:&lt;/code&gt; field inconsistency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Security]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Auth, input validation, secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Architecture]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Composable Architecture&lt;/td&gt;
&lt;td&gt;app/package boundaries, dependency direction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Test]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Test quality&lt;/td&gt;
&lt;td&gt;Coverage, matchers, naming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Observability]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Structured logging, no-truncate rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[AI-Antipattern]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AI-generated code traps&lt;/td&gt;
&lt;td&gt;Hallucinated APIs, fallback overuse, dead code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[Recurrence]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Recurrence prevention&lt;/td&gt;
&lt;td&gt;Bug-fix triage (lint / horizontal rollout / new guideline)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Severity
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security, data corruption, prod-risk, doc inconsistency, missing &lt;code&gt;@graph-*&lt;/code&gt;, quality-bar relaxation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;REQUEST_CHANGES&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Major&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spec violation, Composable Architecture violation, missing tests&lt;/td&gt;
&lt;td&gt;&lt;code&gt;REQUEST_CHANGES&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Minor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Naming, maintainability, light refactor&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;REQUEST_CHANGES&lt;/code&gt; (must be resolved)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Nit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Style preference, minor inconsistency&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;APPROVE&lt;/code&gt; (comment only)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single most important rule is &lt;strong&gt;"no downgrade"&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"&lt;strong&gt;Following existing patterns&lt;/strong&gt;" is not a valid reason to downgrade (existing violations are a separate problem to fix, not justification for new violations).&lt;/li&gt;
&lt;li&gt;"&lt;strong&gt;Will be addressed in a separate PR&lt;/strong&gt;" or "&lt;strong&gt;incrementally&lt;/strong&gt;" is not a valid reason to drop Critical/Major to Nit.&lt;/li&gt;
&lt;li&gt;"&lt;strong&gt;Leave a TODO/FIXME&lt;/strong&gt;" is not a valid deferral path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is spelled out in &lt;a href="https://github.com/air-closet/cortex-review-guidelines/blob/main/en/guidelines/severity.md" rel="noopener noreferrer"&gt;&lt;code&gt;severity.md&lt;/code&gt;&lt;/a&gt;, and the AI literally cites it back when it issues &lt;code&gt;REQUEST_CHANGES&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Example: PR migrating the embedding model from gemini-embedding-001 to gemini-embedding-2
&lt;/h3&gt;

&lt;p&gt;A real review comment is more useful than abstract description. Here's a typical one. This is a feature PR (&lt;code&gt;feat(meet): dual-write embeddings to new 'embedding' column (v2)&lt;/code&gt;) that was merged on 2026-05-19, after &lt;strong&gt;6 review-fix iterations across 1.5 hours&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First review (07:35:25)&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The dual-write approach, the idempotent migration script, and the &lt;code&gt;Promise.all&lt;/code&gt; parallelization all look sound. Please address the 3 Critical and 2 Minor findings below before merging.&lt;/p&gt;


&lt;h2&gt;
  
  
  Critical
&lt;/h2&gt;
&lt;h3&gt;
  
  
  [Graph] Missing &lt;code&gt;@graph-business&lt;/code&gt; tag (x3)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;graph-integrity.md&lt;/code&gt; requires &lt;code&gt;@graph-business&lt;/code&gt; on every top-level declaration under the app layer (severity: Critical).&lt;br&gt;
All 3 new declarations below are missing it (detailed comments in separate threads).&lt;br&gt;
&lt;strong&gt;Existing code is also missing it, but "following existing patterns" is not a valid downgrade reason (severity.md).&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;generateEmbeddingV2&lt;/code&gt; (v2 embedding generation function)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EMBEDDING_MODEL_V2&lt;/code&gt; (model name config)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EMBEDDING_LOCATION_V2&lt;/code&gt; (region config)&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  [Graph] &lt;code&gt;embedMeetContent&lt;/code&gt;'s &lt;code&gt;@graph-connects&lt;/code&gt; doesn't reflect &lt;code&gt;generateEmbeddingV2&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The JSDoc on &lt;code&gt;embedMeetContent&lt;/code&gt; has &lt;code&gt;@graph-connects generateEmbedding [calls] Generate embedding&lt;/code&gt;, but no corresponding &lt;code&gt;@graph-connects&lt;/code&gt; line has been added for the newly introduced &lt;code&gt;generateEmbeddingV2&lt;/code&gt; call.&lt;br&gt;
The graph will be missing an edge to &lt;code&gt;generateEmbeddingV2&lt;/code&gt;.&lt;/p&gt;


&lt;pre class="highlight diff"&gt;&lt;code&gt;   * @graph-connects generateEmbedding [calls] Generate embedding
&lt;span class="gi"&gt;+  * @graph-connects generateEmbeddingV2 [calls] v2 embedding generation (dual-write)
&lt;/span&gt;   * @graph-connects insertMeetChunks [calls] Insert chunks into BQ
&lt;/code&gt;&lt;/pre&gt;



&lt;h3&gt;
  
  
  [Doc] Corresponding BigQuery schema doc is not updated
&lt;/h3&gt;

&lt;p&gt;The "BigQuery schema" section in the related doc is missing the new &lt;code&gt;embedding&lt;/code&gt; column.&lt;br&gt;
Both &lt;code&gt;graph-integrity.md&lt;/code&gt; and &lt;code&gt;severity.md&lt;/code&gt; define doc inconsistency as Critical.&lt;/p&gt;


&lt;pre class="highlight diff"&gt;&lt;code&gt; | `created_at`  | TIMESTAMP   | Created at                              |
&lt;span class="gi"&gt;+| `embedding`   | FLOAT64[]   | Embedding vector (v2: gemini-embedding-2) |
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;



&lt;h2&gt;
  
  
  Minor
&lt;/h2&gt;
&lt;h3&gt;
  
  
  [Test] &lt;code&gt;textEmbeddingV2&lt;/code&gt; value is not asserted
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;objectContaining&lt;/code&gt; allows extra fields, so the test still passes even when the v2 value is never set.&lt;/p&gt;


&lt;pre class="highlight diff"&gt;&lt;code&gt;         textEmbedding: [0.1, 0.2, 0.3],
&lt;span class="gi"&gt;+        textEmbeddingV2: [0.1, 0.2, 0.3],
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;
  
  
  [Test] No isolated scenario for "v2 returns null"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;generateEmbeddingV2: mockGenerateEmbedding&lt;/code&gt; reuses the v1 mock, so the case "v2 returns null while v1 succeeds" is not independently verified.&lt;/p&gt;



&lt;p&gt;&lt;code&gt;&amp;lt;!-- VERDICT:REQUEST_CHANGES --&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The takeaway is the precision of the details.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File + line numbers&lt;/strong&gt; are concrete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggested fixes are in diff format&lt;/strong&gt; (copy-paste ready).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source guideline&lt;/strong&gt; (&lt;code&gt;graph-integrity.md&lt;/code&gt; / &lt;code&gt;severity.md&lt;/code&gt;) is cited explicitly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The typical excuse&lt;/strong&gt; ("existing code has the same problem") is &lt;strong&gt;pre-emptively closed&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The trailing &lt;code&gt;&amp;lt;!-- VERDICT:REQUEST_CHANGES --&amp;gt;&lt;/code&gt; is a &lt;strong&gt;machine-readable verdict marker&lt;/strong&gt; -- the trigger that moves the PR into &lt;code&gt;REQUEST_CHANGES&lt;/code&gt; state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After this, the PR author (= usually another AI running on the author's machine) pushes a fix, the reviewer re-reviews. The next review confirms all 3 Criticals are actually resolved, raises the next Major / Critical, and so on. &lt;strong&gt;6 iterations in 1.5 hours&lt;/strong&gt;, finally APPROVE, auto-merge.&lt;/p&gt;

&lt;p&gt;Plotted on a timeline:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe16f7yn1hfqax6gxq8c5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe16f7yn1hfqax6gxq8c5.png" alt="Real example of the review-fix loop — embedding model migration PR, 6 iterations in 1.5 hours" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With a human reviewer, this is "Critical x3 -&amp;gt; wait until tomorrow for the fix -&amp;gt; re-review the day after" -- 2 to 3 days per PR. cortex closes it in &lt;strong&gt;90 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The difference between human review and auto review is not just speed. A single AI session walks all 9 dimensions in order and cites the guideline each time, which makes it &lt;strong&gt;much harder to miss the "deep" findings humans drop because their attention drifted&lt;/strong&gt; -- doc consistency, recurrence-prevention judgments, weak matchers. Side-by-side comparison:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhet9ew403cpghs4jl7xb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhet9ew403cpghs4jl7xb.png" alt="Before / After — human review era vs. cortex's auto-review era" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is why the review bottleneck never forms here.&lt;/p&gt;
&lt;h2&gt;
  
  
  Evolving the guidelines: catching the moments AI gets it wrong, then fixing the rules
&lt;/h2&gt;

&lt;p&gt;The review guidelines I've been referring to are &lt;strong&gt;not a static document&lt;/strong&gt;. Running this in production surfaces recurring patterns where &lt;strong&gt;the AI mis-judges a specific class of issue&lt;/strong&gt;. Each time that happens, we don't add a comment to the individual PR; we &lt;strong&gt;rewrite the guideline so the AI behaves correctly next time&lt;/strong&gt; -- this is the meta-layer humans actually operate on.&lt;/p&gt;

&lt;p&gt;A few concrete failures we hit on cortex, and how we closed each one by changing the rule, not the PR.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. AI was downgrading because "existing code has the same issue"
&lt;/h3&gt;

&lt;p&gt;Early on, immediately after flagging a violation the AI would add "&lt;strong&gt;however, since existing code has the same violation, I'm downgrading this to Nit&lt;/strong&gt;" and self-downgrade. The result: violations on newly added code kept dropping to Nit, and the system kept emitting Approve.&lt;/p&gt;

&lt;p&gt;We closed this by adding &lt;strong&gt;the no-downgrade rule&lt;/strong&gt; to &lt;code&gt;severity.md&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Following existing patterns" is not a valid downgrade reason: if existing code violates a guideline, new code following that pattern still gets flagged at the same severity. Deferral language like "consider during the next refactor" is not accepted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That wasn't enough on its own. Over time other excuse patterns surfaced -- "&lt;strong&gt;will be addressed in a separate PR&lt;/strong&gt;," "&lt;strong&gt;will be addressed in the next session&lt;/strong&gt;," "&lt;strong&gt;out of scope&lt;/strong&gt;," "&lt;strong&gt;incrementally&lt;/strong&gt;" -- so we added those as forbidden downgrade categories too. We also explicitly forbade &lt;strong&gt;deferring via TODO/FIXME comments in code&lt;/strong&gt;. The mindset is: &lt;strong&gt;close every typical excuse path preemptively&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. The final verdict had 3 options, and "comment-only" left PRs in limbo
&lt;/h3&gt;

&lt;p&gt;The final verdict at the end of every review was originally &lt;code&gt;APPROVE&lt;/code&gt; / &lt;code&gt;REQUEST_CHANGES&lt;/code&gt; / &lt;code&gt;COMMENT&lt;/code&gt; (approve / request changes / comment-only). When the AI picked &lt;code&gt;COMMENT&lt;/code&gt; -- for example when only Minor issues existed -- the script took no action, the PR sat in review-pending forever, and ultimately someone had to manually pick it up. Classic anti-pattern, and it kept happening.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;collapsed the verdict to 2 options&lt;/strong&gt;. Anything Minor or above is &lt;code&gt;REQUEST_CHANGES&lt;/code&gt;, a missing verdict marker defaults to &lt;code&gt;REQUEST_CHANGES&lt;/code&gt; (safe side), and only Nit-only or no findings (with CI passing) yields &lt;code&gt;APPROVE&lt;/code&gt;. The principle: "&lt;strong&gt;if the judgment is ambiguous, fail-safe by defaulting to the blocking side (&lt;code&gt;REQUEST_CHANGES&lt;/code&gt;)&lt;/strong&gt;." Going all-in on that design eliminated the stuck-PR class entirely.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Checklist items had no severity, so the AI's judgment kept drifting
&lt;/h3&gt;

&lt;p&gt;Originally, each guideline (&lt;code&gt;graph-integrity.md&lt;/code&gt;, &lt;code&gt;testing.md&lt;/code&gt;, etc.) was just a &lt;strong&gt;bulleted checklist&lt;/strong&gt;. Items like "Is the test name descriptive?" or "Are mocks minimized?" were listed, but &lt;strong&gt;without per-item severity&lt;/strong&gt;. As a result, the same violation could land as Major in one PR and Nit in another, depending on the session.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;converted every guideline's checklist into a &lt;code&gt;severity&lt;/code&gt; / &lt;code&gt;scope&lt;/code&gt; / &lt;code&gt;criterion&lt;/code&gt; table&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Critical&lt;/td&gt;
&lt;td&gt;All PRs&lt;/td&gt;
&lt;td&gt;Missing &lt;code&gt;@graph-business&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Major&lt;/td&gt;
&lt;td&gt;App layer only&lt;/td&gt;
&lt;td&gt;Missing tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minor&lt;/td&gt;
&lt;td&gt;Shared packages only&lt;/td&gt;
&lt;td&gt;More than 3 function args&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nit&lt;/td&gt;
&lt;td&gt;All PRs&lt;/td&gt;
&lt;td&gt;Naming inconsistency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;scope&lt;/code&gt; column is &lt;strong&gt;a machine-decidable filter&lt;/strong&gt; for which paths a check applies to, so the AI reviewer doesn't trigger irrelevant items on PRs outside that scope. Just putting it in a table -- the judgment reproducibility jumped significantly.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. The existing guidelines didn't catch AI-specific traps
&lt;/h3&gt;

&lt;p&gt;After running this for a while we noticed AI-generated code has its own cluster of antipatterns -- &lt;strong&gt;calling APIs that don't exist&lt;/strong&gt; (hallucinated APIs -- something like &lt;code&gt;user.findOrCreate()&lt;/code&gt; that looks plausible but isn't actually defined), &lt;strong&gt;swallowing errors and returning fallback values&lt;/strong&gt; (e.g., silently returning an empty array when an upstream API fails), &lt;strong&gt;leaving unused functions&lt;/strong&gt; (a refactor adds the new function but doesn't delete the old one, leaving dead code), &lt;strong&gt;expanding the modification scope beyond what was asked&lt;/strong&gt; (you ask it to change one function and it reformats the whole file), &lt;strong&gt;adding unnecessary backward-compatibility code&lt;/strong&gt; (creating a deprecated alias for an internal-only function) -- and &lt;code&gt;security.md&lt;/code&gt; / &lt;code&gt;testing.md&lt;/code&gt; couldn't catch these. There's a &lt;strong&gt;distinct class of "mistakes only AIs make."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We added a dedicated &lt;strong&gt;&lt;code&gt;ai-antipattern.md&lt;/code&gt;&lt;/strong&gt; for this. Reviews now pick these up explicitly under the &lt;code&gt;[AI-Antipattern]&lt;/code&gt; tag. &lt;strong&gt;Reviewing AI output requires designing around AI-specific traps&lt;/strong&gt; -- you don't get there just by porting human review heuristics onto an AI.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. The AI tries to relax "the standard itself"
&lt;/h3&gt;

&lt;p&gt;The last and most important pattern. When the AI was writing fix PRs, occasionally instead of fixing the guideline violation it would write &lt;strong&gt;a PR that relaxes the guideline&lt;/strong&gt;. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lower the test coverage threshold to avoid writing more tests&lt;/li&gt;
&lt;li&gt;Narrow the in-house lint rule's scope to make the violation go away&lt;/li&gt;
&lt;li&gt;Soften the guideline doc language from "recommended" to "preferred" to weaken the binding constraint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the AI builds a formally-coherent justification: "&lt;strong&gt;existing code already violates this, so let's adjust the standard to match the implementation.&lt;/strong&gt;" Left unchecked, &lt;strong&gt;the AI gradually walks the quality bar down&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We closed this by adding &lt;strong&gt;"quality-bar relaxation" as a Critical&lt;/strong&gt; in &lt;code&gt;severity.md&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A PR that relaxes the quality bar -- guideline doc, lint rule, coverage threshold -- must not be Approved by the AI reviewer. It is sent back with &lt;code&gt;REQUEST_CHANGES&lt;/code&gt;. &lt;strong&gt;A human reviewer's approval is required&lt;/strong&gt;. "Existing code already violates this" is not a valid justification for relaxation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the one explicit boundary where &lt;strong&gt;we deliberately do not give the AI autonomous Approve authority&lt;/strong&gt;. Whether the standard itself moves is a human decision. It's the &lt;strong&gt;meta-level safety valve&lt;/strong&gt; for the "AI reviewing AI" architecture.&lt;/p&gt;
&lt;h3&gt;
  
  
  Evolving the guidelines is the meta-layer humans actually operate on
&lt;/h3&gt;

&lt;p&gt;The common thread: "&lt;strong&gt;when the AI gets it wrong, don't override the individual PR -- rewrite the guideline so the fix propagates forward.&lt;/strong&gt;"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI escapes via "existing code has the same issue" -&amp;gt; add no-downgrade rule&lt;/li&gt;
&lt;li&gt;AI picks "comment-only" and PR stalls -&amp;gt; collapse to 2-option verdict&lt;/li&gt;
&lt;li&gt;AI's judgment drifts -&amp;gt; add severity / scope columns to every item&lt;/li&gt;
&lt;li&gt;AI falls into its own traps -&amp;gt; add the AI-Antipattern category&lt;/li&gt;
&lt;li&gt;AI tries to relax the standard -&amp;gt; classify standard-relaxation as Critical, require human Approve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As long as this loop turns, the guideline is &lt;strong&gt;a living document that absorbs the failure patterns AI produces in production&lt;/strong&gt;. &lt;strong&gt;Don't try to write the perfect guideline up front. Catch the moment AI gets it wrong, and write the rule for that moment.&lt;/strong&gt; That's the actual mechanism behind "quality doesn't drop even when humans aren't inside the loop."&lt;/p&gt;

&lt;p&gt;And one more thread. Right now, the trigger for "AI got it wrong, time to rewrite the guideline" is still mostly a human judgment, but &lt;strong&gt;parts of that maintenance are gradually becoming automatable too&lt;/strong&gt;. &lt;strong&gt;Alert-Fix&lt;/strong&gt; (Part 4 next time) -- where AI investigates production incidents, opens a fix PR, runs it through auto-review, and auto-redeploys -- requires every fix PR to write one of {add lint, add guideline, horizontal rollout} under the &lt;code&gt;[Recurrence]&lt;/code&gt; lens. So the &lt;strong&gt;AI is increasingly participating in the maintenance of its own review criteria&lt;/strong&gt;, with humans still in the loop on adoption. I'll come back to this in Part 4.&lt;/p&gt;
&lt;h2&gt;
  
  
  Auto-fix: a separate AI applies the changes and pushes
&lt;/h2&gt;

&lt;p&gt;Once &lt;code&gt;REQUEST_CHANGES&lt;/code&gt; lands, &lt;strong&gt;the same script running on the PR author's machine, but in author mode&lt;/strong&gt;, picks up the event and starts working.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[REQUEST_CHANGES detected]
   | SSE push via Event Relay
[Author mode boots on PR author's machine]
   | Merge origin/main into a worktree
   |  (lockfile resolved up front, remaining conflicts handled by AI)
   | Read the auto-review comment as context
   | Run claude -p inside the worktree
   | Commit + push the changes
   | New SHA is delivered back to the reviewer's machine via Event Relay -&amp;gt; re-review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two design choices matter here.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reviewer and author run on different machines in different sessions&lt;/strong&gt; -- reviewer mode and author mode are the same script, but they run on different machines in different processes. "Is the original critique correct?" is judged independently. Unlike a single AI fixing its own complaints, the judgment passes between two separate sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All iteration stays inside the same PR&lt;/strong&gt; -- we don't spawn a new PR. The "&lt;strong&gt;fix the root cause, no deferrals&lt;/strong&gt;" rule from Part 2 and the review guidelines kicks in here: if the AI tries to escape via &lt;code&gt;TODO/FIXME&lt;/code&gt; or by splitting work out into a separate PR, the next review rejects it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Auto-merge + parallel deploy
&lt;/h2&gt;

&lt;p&gt;Once auto-review returns APPROVE and CI is fully green, the &lt;code&gt;auto-merge&lt;/code&gt; script runs and squash-merges the PR.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Auto review APPROVE + CI green]
   |
auto-merge script
   | squash merge to main
   |
[main updated]
   |
Turborepo build (affected packages only)
   |
Pulumi up (multiple stacks in parallel)
   |- API services
   |- pipeline services
   |- MCP servers
   `- infra
   |
[Deploy complete]
   |
cpg index rebuilt (only changed nodes regenerate embeddings -- see Part 2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;pulumi up &amp;lt;stack1&amp;gt; &amp;lt;stack2&amp;gt; ...&lt;/code&gt; runs in parallel, so deploying 9 stacks at once finishes in about 8-12 minutes. End to end, merge-to-production is averaging 10-15 minutes.&lt;/p&gt;

&lt;p&gt;This compounds nicely with &lt;code&gt;auto-fix&lt;/code&gt; PRs. &lt;strong&gt;Incident alert -&amp;gt; Alert-Fix identifies root cause -&amp;gt; opens a fix PR -&amp;gt; auto review pass -&amp;gt; auto merge -&amp;gt; auto deploy&lt;/strong&gt; runs as a single closed loop without human involvement (covered in Part 4).&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers, in more detail
&lt;/h2&gt;

&lt;p&gt;Unpacking the headline numbers a bit further.&lt;/p&gt;

&lt;h3&gt;
  
  
  Depth of the review-fix loop
&lt;/h3&gt;

&lt;p&gt;Across 769 PRs in 30 days, the &lt;strong&gt;average per PR was 10.8 review iterations, max 56&lt;/strong&gt;. The fact that the average is past 10 means &lt;strong&gt;the first review almost always surfaces at least one finding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The embedding-model migration PR shown earlier needed 6 iterations to merge, and that's representative of the average PR. &lt;strong&gt;What would take a human reviewer days, cortex resolves in minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What the auto reviewer typically flags
&lt;/h3&gt;

&lt;p&gt;The most common findings out of the first review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;[Graph] Missing &lt;code&gt;@graph-business&lt;/code&gt;&lt;/strong&gt; -- a prerequisite cpg leans on (from Part 2). The classic finding on newly added declarations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Doc] Doc inconsistency&lt;/strong&gt; -- code changed but the corresponding &lt;code&gt;docs/&lt;/code&gt; section was not updated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Test] Weak matchers&lt;/strong&gt; -- &lt;code&gt;objectContaining&lt;/code&gt; weakening value assertions, single-property checks via &lt;code&gt;toBe&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Observability] Unstructured error logs&lt;/strong&gt; -- &lt;code&gt;event&lt;/code&gt; field or required keys deviating from the structured-log spec.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Recurrence] No recurrence-prevention action&lt;/strong&gt; -- a bug-fix PR description not declaring which of {lint / horizontal rollout / add guideline / nothing} applies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are categories &lt;strong&gt;human reviewers frequently miss in practice&lt;/strong&gt;, especially doc consistency and recurrence-prevention checks. The AI reviewer applies them mechanically on every PR.&lt;/p&gt;

&lt;h3&gt;
  
  
  Actual false-positive rate
&lt;/h3&gt;

&lt;p&gt;It's not zero. A few times a month we get "this is Nit, not Major" type misjudgments. The fix path is the one described above -- not a comment on the individual PR, but a guideline edit that corrects the judgment for all subsequent reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed / Bridge to Part 4
&lt;/h2&gt;

&lt;p&gt;Over the past six months, the engineer's role on cortex shifted from "&lt;strong&gt;writer&lt;/strong&gt;" and "&lt;strong&gt;reviewer&lt;/strong&gt;" to "&lt;strong&gt;operator&lt;/strong&gt;" -- the human running the system, not acting inside each individual decision.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI writes the code (Claude Code)&lt;/li&gt;
&lt;li&gt;AI reviews the code (auto review)&lt;/li&gt;
&lt;li&gt;A different AI applies the fixes (author mode running on the PR author's machine)&lt;/li&gt;
&lt;li&gt;AI decides when to merge (auto-merge script)&lt;/li&gt;
&lt;li&gt;Deploys go in parallel (Turborepo + Pulumi)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What stays in human hands: "&lt;strong&gt;what to build at all&lt;/strong&gt; (product / requirements)," "&lt;strong&gt;is this direction actually right&lt;/strong&gt; (architectural judgment)," "&lt;strong&gt;which guideline to add and where&lt;/strong&gt;," and "&lt;strong&gt;look at the reviews and adjust prompts and guidelines accordingly&lt;/strong&gt;." High-abstraction work -- &lt;strong&gt;not individual decisions, but watching the whole system from above and steering&lt;/strong&gt;. &lt;strong&gt;From human-in-the-loop to human-on-the-loop&lt;/strong&gt;, you could say.&lt;/p&gt;

&lt;p&gt;The widely-reported phenomena -- "AI lowers quality," "the reviewer becomes the bottleneck" -- happen when &lt;strong&gt;the harness is extended on the writer side only, and the reviewer side is left to humans&lt;/strong&gt;. If writing speeds up and reviewing doesn't, of course it bottlenecks. Of course things get missed.&lt;/p&gt;

&lt;p&gt;cortex is the opposite. &lt;strong&gt;We extended the harness on the reviewer side first, before fully extending it on the writer side&lt;/strong&gt;. Anthropic's observation that the bottleneck shifts from writing to reviewing is exactly right -- which is precisely why "&lt;strong&gt;move the reviewer role to AI as well&lt;/strong&gt;" is the answer cortex chose.&lt;/p&gt;

&lt;p&gt;"The AI writes the code, the AI reviews the code." That's the core of cortex's auto-review pipeline. &lt;strong&gt;Quality drop and review bottleneck are functions of how far you extend the harness&lt;/strong&gt; -- they are not inherent to AI-assisted development.&lt;/p&gt;




&lt;p&gt;Up next in &lt;strong&gt;Part 4&lt;/strong&gt;: &lt;strong&gt;Alert-Fix + observability + auto-added guardrails&lt;/strong&gt; -- a pipeline where a production alert (observed via OTel/Faro/Prometheus) triggers AI investigation, an AI-authored fix PR plus a new lint/type gate, auto-review, auto-merge, and auto-redeploy. The fix and a recurrence-prevention guardrail land together, so the same class of incident structurally can't fire again. If auto review protects quality at PR time, Part 4 protects it &lt;strong&gt;at production time, while growing the quality gates themselves&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The headline number above includes &lt;code&gt;auto-fix&lt;/code&gt;-flavored PRs (= Alert-Fix output). For certain classes of incidents, the fix is already merged before anyone has time to react -- that's where cortex sits today. See you next time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Heart of the AI Harness: A Knowledge Graph of the AI, by the AI, for the AI (Series Part 2)</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Tue, 19 May 2026 14:16:20 +0000</pubDate>
      <link>https://forem.com/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-53bm</link>
      <guid>https://forem.com/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-53bm</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: "cortex" and "cortex-product-graph" referenced in this article are internal code names for an AI platform developed in-house at airCloset. They are unrelated to existing commercial services such as Snowflake Cortex or Palo Alto Networks Cortex.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In &lt;a href="https://dev.to/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa"&gt;Part 1 (Series Intro)&lt;/a&gt;, I wrote about how &lt;strong&gt;AI handles PR reviews and incident response&lt;/strong&gt; on top of a platform we call cortex. At the center of that flywheel is the &lt;strong&gt;Product Graph&lt;/strong&gt; (implementation name: &lt;code&gt;cortex-product-graph&lt;/code&gt;, or cpg) — a unified knowledge graph of code, docs, DB schemas, and infrastructure definitions, queryable through semantic search.&lt;/p&gt;

&lt;p&gt;In Part 1, I described cpg at a high level: "all of cortex is indexed in one graph." This post goes deeper — &lt;strong&gt;how it's built, why we landed on this design, and what actually changed&lt;/strong&gt; once it was in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Series Index
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Key scene&lt;/th&gt;
&lt;th&gt;Article&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Series intro: cortex's harness&lt;/td&gt;
&lt;td&gt;PRs auto-merge / incidents self-heal before you notice&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa"&gt;ai-harness-intro&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Product Graph (cpg)&lt;/td&gt;
&lt;td&gt;Code, docs, DB, infra unified into one graph&lt;/td&gt;
&lt;td&gt;this post ← you are here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI PR review&lt;/td&gt;
&lt;td&gt;webhook → AI review → auto-fix → squash merge&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Alert-Fix + observability + auto-added guardrails&lt;/td&gt;
&lt;td&gt;Alert → AI investigates → fix PR + new lint/type gate → auto redeploy + recurrence blocked&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Scaling the harness from cortex to toC services&lt;/td&gt;
&lt;td&gt;Non-engineer contributions in practice + scaling cortex's harness to the whole product org&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Start with One Scene
&lt;/h2&gt;

&lt;p&gt;"I want to change the calculation logic behind the 'bug rate' KPI on the dashboard. &lt;strong&gt;Where is it, and what might break?&lt;/strong&gt;" — imagine that question comes up before you touch any code.&lt;/p&gt;

&lt;p&gt;When you ask an AI this directly, with no function name and no file path given, it hits cpg with a semantic search and pulls the relevant nodes in one shot. What comes back isn't just functions — it includes &lt;strong&gt;BigQuery tables&lt;/strong&gt; and &lt;strong&gt;API endpoints&lt;/strong&gt; alongside the code. And at the end of the response, there's a &lt;strong&gt;"next action candidates (Runbook)"&lt;/strong&gt; block that tells the AI to re-probe starting from the BQ table with the most reads and writes flowing through it.&lt;/p&gt;

&lt;p&gt;The final answer looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calculation site&lt;/strong&gt;: &lt;code&gt;calculateRatePer100pt&lt;/code&gt; / &lt;code&gt;calculateBugCount&lt;/code&gt; — both pure functions with no I/O side effects; safe to change in isolation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writers (upstream)&lt;/strong&gt;: &lt;code&gt;syncKpiMetrics&lt;/code&gt; / &lt;code&gt;writeKpiMetrics&lt;/code&gt; / &lt;code&gt;backfillKpiMetrics&lt;/code&gt; all write to the &lt;code&gt;kpi_bug_rate_per_100pt&lt;/code&gt; table; these are the real aggregation batch jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Readers (downstream)&lt;/strong&gt;: &lt;code&gt;BigQueryKpiRepository.getSummaryByDate&lt;/code&gt; reads via BigQuery → &lt;code&gt;/kpi/bugs&lt;/code&gt; API → KPI dashboard page&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Related docs&lt;/strong&gt;: &lt;code&gt;docs/generator/kpi.md&lt;/code&gt; defines bug rate; updating the code without updating docs would leave them stale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"Update the docs together, and schedule the deploy when the aggregation batch isn't running" — that's a decision you can make with confidence.&lt;/p&gt;

&lt;p&gt;I personally know all this — I wrote it. But that's exactly the problem: &lt;strong&gt;anyone else who wanted to touch this had to track me down&lt;/strong&gt;. Three months ago, "finding out where something lives and what would break" meant finding me. Now, this same investigation is done by &lt;strong&gt;PMO members (non-engineers) using cpg on their own&lt;/strong&gt;. grep didn't get them there; documentation didn't get them there. One natural-language question did.&lt;/p&gt;

&lt;p&gt;What makes that possible is cpg — a graph where you can follow "&lt;strong&gt;what you want to do&lt;/strong&gt;" in plain language to the relevant nodes in one or two hops, even when you don't know the function name. The &lt;strong&gt;Runbook structure&lt;/strong&gt; — where the tool's return value itself contains the next tool call to make — is what lets the AI re-select its starting point and drill deeper on its own.&lt;/p&gt;

&lt;p&gt;That's the setup. Now let me explain how it's built.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Static Analysis Alone Couldn't Do
&lt;/h2&gt;

&lt;p&gt;cortex has a separate system that &lt;strong&gt;graph-analyzes the production codebase using static analysis&lt;/strong&gt; (I'll write about this in its own post — just touching it here). It parses JS/TS code with AST analysis across our external-facing production repos, automatically extracting function call graphs, API endpoints, DB access patterns, and event pub/sub relationships.&lt;/p&gt;

&lt;p&gt;This works well for what it does, and &lt;strong&gt;we still use it actively in the production repos&lt;/strong&gt;. But when we tried applying the same approach to cortex itself, it didn't get us where we wanted to go.&lt;/p&gt;

&lt;p&gt;Three specific gaps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No context&lt;/strong&gt; — nodes exist but carry no &lt;em&gt;meaning&lt;/em&gt;. "What is this API for?" "Why does this column exist?" isn't in the graph. Ask "where is the code that calculates the KPI bug rate?" and you'll miss unless the function name happens to look like it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No entry point&lt;/strong&gt; — you &lt;strong&gt;already have to know&lt;/strong&gt; the file path or function name before search can start. "Let me go find it" doesn't work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explosion after 1–2 hops&lt;/strong&gt; — starting from any node, related nodes multiply exponentially within a couple of hops, far exceeding what an AI can process in one context window. Trace results become too long to use.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The summary: &lt;strong&gt;mechanically accurate, but no semantic weighting&lt;/strong&gt;. To be genuinely useful to AI, you need one more layer: "&lt;strong&gt;what matters, and why things are connected.&lt;/strong&gt;"&lt;/p&gt;

&lt;h2&gt;
  
  
  Meanwhile, DB Graph Was Working
&lt;/h2&gt;

&lt;p&gt;Around the same time, a different approach — the &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;DB Graph MCP&lt;/a&gt; we'd built — was working exactly as intended.&lt;/p&gt;

&lt;p&gt;DB Graph is an MCP server with access to &lt;strong&gt;15 schemas and 991 tables&lt;/strong&gt; inside cortex, supporting semantic search over tables and columns with &lt;strong&gt;AI-generated descriptions&lt;/strong&gt;. A natural-language query like "tables related to return processing confirmation" would find semantically connected nodes even when the table name doesn't contain those words.&lt;/p&gt;

&lt;p&gt;After thinking about why this worked, the answer became clear: &lt;strong&gt;DB Graph has a business-context description attached to every node, and that description is what feeds into the embeddings&lt;/strong&gt;. That semantic weight is what "finding by meaning" actually runs on.&lt;/p&gt;

&lt;p&gt;Static-analysis code graph had none of that. Type relationships and call graphs exist — but "&lt;strong&gt;why this function exists&lt;/strong&gt;" was never written anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hypothesis — Bring DB Graph's Essence into the Code Graph
&lt;/h2&gt;

&lt;p&gt;The hypothesis was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"A business-context description on every node, loaded into embeddings" — if that's the core of why DB Graph works, then doing the same thing for the code graph should structurally overcome the limits of static analysis.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem was: &lt;strong&gt;where do you put the "business context"?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All the options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;External docs&lt;/td&gt;
&lt;td&gt;Design docs / wiki / Notion&lt;/td&gt;
&lt;td&gt;Separate from code. Drifts instantly. Nobody maintains it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External metadata&lt;/td&gt;
&lt;td&gt;Sidecar YAML / &lt;code&gt;*.meta.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Dual-management. Breaks on rename.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dedicated graph DB&lt;/td&gt;
&lt;td&gt;Write annotations directly into Neo4j / Neptune&lt;/td&gt;
&lt;td&gt;Dual-management again. Doesn't show up in PR diffs — unreviewable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript decorator&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@GraphNode({...})&lt;/code&gt; in code&lt;/td&gt;
&lt;td&gt;Lives in the transpiled output = runtime dependency. Can't be extracted by AST alone.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DSL file&lt;/td&gt;
&lt;td&gt;Custom &lt;code&gt;.graph&lt;/code&gt; file format&lt;/td&gt;
&lt;td&gt;High learning cost. No editor support out of the box.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSDoc comments&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@graph-business&lt;/code&gt; / &lt;code&gt;@graph-connects&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Physically co-located with the code. Extractable by AST alone. Zero runtime dependency.&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The choice of &lt;strong&gt;JSDoc over decorators&lt;/strong&gt; was intentional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero runtime dependency&lt;/strong&gt;: decorators survive into the transpiled output and can affect runtime behavior. JSDoc has no executable runtime semantics; with production builds that strip comments, it leaves no runtime artifact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generalizes beyond TypeScript&lt;/strong&gt;: the same &lt;code&gt;@graph-*&lt;/code&gt; syntax can extend to Pulumi definitions in &lt;code&gt;infra/&lt;/code&gt; and Markdown frontmatter in &lt;code&gt;docs/&lt;/code&gt;. Decorators are locked to TypeScript syntax.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single AST pass&lt;/strong&gt;: ts-morph can walk declarations and extract JSDoc in one scan. Decorators sometimes require type resolution, which slows builds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shows up naturally in PR diffs&lt;/strong&gt;: JSDoc sits directly above the code it annotates, so when code changes, the JSDoc diff appears in the same file. Reviewers can't miss it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Doubles as documentation for both humans and AI&lt;/strong&gt;: JSDoc already serves as IDE hover text and AI-readable context. Putting &lt;code&gt;@graph-business&lt;/code&gt; there means it simultaneously explains the declaration to a human reading the code, and gives a coding AI semantic context about the surrounding functions. Graph metadata that also functions as inline documentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that the essence of this design is &lt;strong&gt;using parseable annotations co-located with code as the SSoT&lt;/strong&gt; — TypeScript / JSDoc is just one implementation. The same pattern works in any language with comparable comment + AST primitives: Python docstrings + &lt;code&gt;ast&lt;/code&gt;, Go comments + &lt;code&gt;go/ast&lt;/code&gt;, Rust &lt;code&gt;///&lt;/code&gt; + &lt;code&gt;syn&lt;/code&gt;. &lt;strong&gt;What matters isn't &lt;em&gt;where&lt;/em&gt; you write the annotations, but the invariant: "physically co-located with the code, extractable by AST alone."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same goes for the monorepo: &lt;strong&gt;this pattern doesn't depend on cortex being a monorepo&lt;/strong&gt;. If anything, &lt;strong&gt;its real value shows when repositories are split and AI can't easily follow code across them&lt;/strong&gt;. In a monorepo, the AI can still grep / read files across the whole tree; in a multi-repo, the cross-repo calls and data flows are the hard part to follow. Run the same build per repo, emit nodes / edges, aggregate into a central graph, and those cross-repo connections become reachable in one hop. We actually run a parallel knowledge graph over our external-facing production repos (multi-repo) using the same pattern — more on that in a separate post.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach — Abandon Code Inference, Make JSDoc the SSoT
&lt;/h2&gt;

&lt;p&gt;The code graph's problem was &lt;strong&gt;no meaning&lt;/strong&gt;. The answer is simple: &lt;strong&gt;embed the meaning directly in the code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For cortex's own code graph, we &lt;strong&gt;completely abandoned the approach of inferring graph structure from code&lt;/strong&gt;. Instead:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Every declaration — function / class / method / API / Page / Cron / etc. — gets a dedicated JSDoc tag. The graph is assembled from those.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means the &lt;strong&gt;SSoT (Single Source of Truth) for business context becomes the code itself&lt;/strong&gt;. There's no gap between docs and code, because &lt;strong&gt;the JSDoc in the code is the authoritative source&lt;/strong&gt;. The structural problem of "AI makes mistakes because docs are stale" is resolved at the level of where the data lives.&lt;/p&gt;

&lt;p&gt;Placing the two side by side — "a graph from code inference alone" versus "a knowledge graph with JSDoc as SSoT" — makes the difference in what's carried on each node immediately visible:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjioh14o8c9ttqzg8fkd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjioh14o8c9ttqzg8fkd3.png" alt="Before / After — graph from code inference alone vs. knowledge graph with JSDoc as SSoT" width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a concrete example of the tags (from cpg's own source):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * Set embeddings on nodes in place.
 * Compares textForEmbedding against existing BQ data; only re-generates
 * for nodes where the text has changed.
 *
 * @graph-stack product-graph
 * @graph-domain Engineering
 * @graph-business Compares hash of textForEmbedding against existing BQ nodes; re-generates
 *   embedding only for nodes where text has changed. Unchanged nodes reuse BQ embeddings.
 * @graph-connects cortex.product_graph_nodes [queries, via:id] read existing embeddings
 * @graph-connects vertex-ai-embedding [calls] generate embeddings for changed nodes
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ProductGraphNode&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;force&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What each tag does:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tag&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@graph-node&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Explicitly declares node type (defaults to Function)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@graph-stack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The infra stack this declaration belongs to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@graph-domain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Business domain (comma-separated, multiple allowed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@graph-business&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;What this declaration specifically does&lt;/strong&gt; — the body of the embedding input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@graph-connects&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Connection targets (multiple allowed; &lt;code&gt;via:&lt;/code&gt; for parameter-level tracking; &lt;code&gt;none&lt;/code&gt; to explicitly declare no connections)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key is that &lt;code&gt;@graph-business&lt;/code&gt; &lt;strong&gt;feeds directly into the embedding input&lt;/strong&gt;. It's not the node name — it's a &lt;strong&gt;natural-language sentence&lt;/strong&gt; that carries semantic weight into search. In practice, almost all of these sentences are written by AI: during the normal flow of writing code in cortex, the AI writes the JSDoc alongside the code (and thanks to the ESLint enforcement below, it doesn't forget).&lt;/p&gt;

&lt;h3&gt;
  
  
  Making Omissions Physically Impossible
&lt;/h3&gt;

&lt;p&gt;This design collapses the moment someone leaves a tag out. One function without &lt;code&gt;@graph-business&lt;/code&gt; = that function is invisible to semantic search. One without &lt;code&gt;@graph-connects&lt;/code&gt; = the data flow through that function is absent from the graph.&lt;/p&gt;

&lt;p&gt;So we built &lt;strong&gt;enforcement that makes omissions physically impossible&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5 ESLint plugins&lt;/strong&gt; — tag presence validation, syntax validation, naming convention enforcement (stack / domain allowlists), &lt;code&gt;@graph-connects&lt;/code&gt; required, &lt;code&gt;@graph-connects none&lt;/code&gt; misuse detection (flags when &lt;code&gt;none&lt;/code&gt; appears on code that calls external services)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated PR review&lt;/strong&gt; (Part 1 ③) — tags missing are flagged as &lt;code&gt;[Graph] Critical&lt;/code&gt;; docs inconsistency is flagged as &lt;code&gt;[Doc] Critical&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: &lt;strong&gt;"write a declaration → business context is always written with it"&lt;/strong&gt; holds as an invariant. Add a function → its meaning and connections are necessarily in its JSDoc.&lt;/p&gt;

&lt;p&gt;One honest note: &lt;strong&gt;forcing "5 JSDoc tags on every declaration" on humans would blow up in code review within three days&lt;/strong&gt;. Writing a &lt;code&gt;@graph-business&lt;/code&gt; sentence per function, enumerating &lt;code&gt;@graph-connects&lt;/code&gt; exhaustively, checking the naming allowlists — that's genuinely tedious at scale.&lt;/p&gt;

&lt;p&gt;This works because &lt;strong&gt;AI writes the code&lt;/strong&gt;. Writing four required JSDoc tags (plus optional &lt;code&gt;@graph-node&lt;/code&gt; when the default &lt;code&gt;Function&lt;/code&gt; type isn't enough) is rounding error on top of writing the code itself. With ESLint and automated review in the feedback loop, the AI doesn't miss tags — and human reviewers only need to check "is this tag factually correct?" not "is it there?"&lt;/p&gt;

&lt;p&gt;:::message&lt;br&gt;
This design is one that &lt;strong&gt;can't realistically be maintained when humans write code&lt;/strong&gt;, but &lt;strong&gt;becomes viable the moment AI does&lt;/strong&gt;. It's an AI-first design. The premise of AI-first development is what lets business context be fixed in code as the SSoT.&lt;br&gt;
:::&lt;/p&gt;
&lt;h3&gt;
  
  
  Where Hallucination Happens Shifts
&lt;/h3&gt;

&lt;p&gt;Viewed from another angle, what's going on here is that &lt;strong&gt;the location of hallucination shifts&lt;/strong&gt;. &lt;strong&gt;Where you contain hallucination is, I think, fundamental to AI harness design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As I &lt;a href="https://dev.to/ryantsuji/graph-rag-isnt-a-one-shot-anymore-the-case-for-agentic-graph-rag-mcps-1dj5"&gt;wrote elsewhere&lt;/a&gt;, when you combine AI with a graph system, "&lt;strong&gt;hallucination doesn't disappear — it just changes location.&lt;/strong&gt;" For cpg, here's where it lands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graph build / query phase&lt;/strong&gt;: &lt;strong&gt;No fresh LLM generation.&lt;/strong&gt; Once reviewed metadata lands in the graph, the ts-morph AST pass, the BigQuery MERGE, and the MCP query responses are all deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSDoc writing phase&lt;/strong&gt;: This is the entry point for hallucination. Whether &lt;code&gt;@graph-business&lt;/code&gt; is factually accurate, or whether &lt;code&gt;@graph-connects&lt;/code&gt; is exhaustively listed — these can go wrong since the AI is writing them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But &lt;strong&gt;the entry point is locked down by automated PR review&lt;/strong&gt;. Missing tags get &lt;code&gt;[Graph] Critical&lt;/code&gt;; factual drift gets &lt;code&gt;[Doc] Critical&lt;/code&gt;. When something's wrong, either the AI that wrote the code or another reviewer AI catches it and fixes it.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;once data lands in the graph, it can be treated as deterministically sourced from reviewed code, not as a fresh generated answer that might hallucinate on every query&lt;/strong&gt;. AI agents calling cpg don't have to guard against "this might be a generated lie" on every returned node or edge. The tools can be designed as "return facts only" without compromise.&lt;/p&gt;
&lt;h2&gt;
  
  
  Build — AST to Graph via ts-morph
&lt;/h2&gt;

&lt;p&gt;Once JSDoc is established as the SSoT, the rest is mechanics: extract it and assemble the graph. The implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AST-analyze JS/TS with ts-morph&lt;/strong&gt; — walk every declaration (function / class / method / type / enum / variable / expression statement / &lt;code&gt;export default&lt;/code&gt; / etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract &lt;code&gt;@graph-*&lt;/code&gt; tags from JSDoc&lt;/strong&gt; — collect the four required tags plus optional &lt;code&gt;@graph-node&lt;/code&gt; and normalize into a &lt;code&gt;ParsedGraphTags&lt;/code&gt; structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate nodes&lt;/strong&gt; — use &lt;code&gt;qualifiedName = "&amp;lt;filePath&amp;gt;:&amp;lt;name&amp;gt;"&lt;/code&gt; as the node ID&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate edges&lt;/strong&gt; — one edge per &lt;code&gt;@graph-connects&lt;/code&gt; entry, with &lt;code&gt;via:&lt;/code&gt; / &lt;code&gt;cardinality&lt;/code&gt; and other metadata preserved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate embeddings&lt;/strong&gt; — send &lt;code&gt;@graph-business&lt;/code&gt; text to Vertex AI Embedding (&lt;code&gt;gemini-embedding-2&lt;/code&gt;) and vectorize it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load into BigQuery&lt;/strong&gt; — MERGE all nodes / edges into &lt;code&gt;cortex.product_graph_nodes&lt;/code&gt; / &lt;code&gt;cortex.product_graph_edges&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because &lt;code&gt;@graph-business&lt;/code&gt; goes directly into the embedding input, querying "&lt;strong&gt;code that calculates the KPI bug rate&lt;/strong&gt;" in natural language returns a hit based on semantic proximity of the description — even when the function name contains neither "bug" nor "rate."&lt;/p&gt;

&lt;p&gt;The overall flow: the three tracks (&lt;code&gt;apps/&lt;/code&gt; / &lt;code&gt;infra/&lt;/code&gt; / &lt;code&gt;docs/&lt;/code&gt;) each go through their own parser, are merged into a single node set by the generator, and only nodes whose text has changed are sent to Vertex AI before being stored in BigQuery:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lbjcqd6hrnpc01d460p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lbjcqd6hrnpc01d460p.png" alt="Build pipeline — assembling one knowledge graph from JSDoc, Pulumi, and docs" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Build Cost Is Effectively Zero
&lt;/h3&gt;

&lt;p&gt;The build runs automatically on push to main via GitHub Actions, using a differential embedding approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compare &lt;code&gt;textForEmbedding&lt;/code&gt; of each BQ node against the new text&lt;/li&gt;
&lt;li&gt;Unchanged nodes reuse their existing BQ embeddings&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Only changed nodes go to Vertex AI&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A typical push changes a few dozen nodes, so cost is &lt;strong&gt;under $0.001&lt;/strong&gt;. Full regeneration (for recovery, triggered via &lt;code&gt;workflow_dispatch&lt;/code&gt;) is ~$0.075 for 8,000+ nodes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why BigQuery, Not a Graph Database
&lt;/h3&gt;

&lt;p&gt;When people hear "knowledge graph," they often imagine a dedicated graph DB (Neo4j, Neptune, Memgraph, etc.). cortex runs on &lt;strong&gt;just two BigQuery tables&lt;/strong&gt; (&lt;code&gt;product_graph_nodes&lt;/code&gt; / &lt;code&gt;product_graph_edges&lt;/code&gt;). Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Different cost structure&lt;/strong&gt; — dedicated graph DBs set a floor of "always-on cluster cost"; for the current implementation, BQ is &lt;strong&gt;storage + on-demand queries only&lt;/strong&gt;. Even with continuous AI traffic, it's clearly cheaper than running a server 24/7.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector search / cosine similarity / SQL in the same place&lt;/strong&gt; — BQ has &lt;a href="https://cloud.google.com/bigquery/docs/vector-search" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR_SEARCH&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-distance" rel="noopener noreferrer"&gt;&lt;code&gt;ML.DISTANCE&lt;/code&gt;&lt;/a&gt;, so semantic search over &lt;code&gt;@graph-business&lt;/code&gt; embeddings, filter by node properties, and adjacent-node JOINs can all live in &lt;strong&gt;one query&lt;/strong&gt;. That matters when "semantic search + property filter + neighbor JOIN" is the standard access pattern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration-ready for GQL once BQ Graph goes GA&lt;/strong&gt; — BQ already has &lt;a href="https://cloud.google.com/bigquery/docs/graph-overview" rel="noopener noreferrer"&gt;Graph in BigQuery&lt;/a&gt; in Preview; once it ships GA, you can put a graph view over the existing tables and likely shift to &lt;code&gt;MATCH (n)-[e]-&amp;gt;(m)&lt;/code&gt; queries in GQL. &lt;strong&gt;The current table design is already migration-ready.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In short: &lt;strong&gt;get the graph DB's future strength (GQL) while running on plain BQ tables today&lt;/strong&gt;. Compared to adding a graph DB on top of a generic RAG stack (pgvector / Pinecone / etc.), fewer systems to operate and lower learning curve.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Core Part Is Available as an Open-Source Sample
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;"parse JSDoc annotations with AST analysis and output a graph"&lt;/strong&gt; part is small enough to reproduce cleanly, so I published it as a working sample:&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/thujikun/graph-jsdoc-extractor" rel="noopener noreferrer"&gt;graph-jsdoc-extractor&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's a ~500-line library that extracts &lt;code&gt;@graph-*&lt;/code&gt; and outputs ndjson of &lt;code&gt;{ kind: "node", ... }&lt;/code&gt; / &lt;code&gt;{ kind: "edge", ... }&lt;/code&gt; objects. Comes with a &lt;code&gt;pnpm run example&lt;/code&gt; that runs end-to-end. For those who just want to see the output format without cloning, the built ndjson is checked in: &lt;strong&gt;&lt;a href="https://github.com/thujikun/graph-jsdoc-extractor/blob/main/examples/sample/output.ndjson" rel="noopener noreferrer"&gt;examples/sample/output.ndjson&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is intentionally just the "turn code into a graph" part. The real value in cortex starts when &lt;strong&gt;docs and DB schemas land on the same graph&lt;/strong&gt; — that's the next section.&lt;/p&gt;
&lt;h2&gt;
  
  
  Connections — Landing Docs and DB on the Same Graph
&lt;/h2&gt;

&lt;p&gt;Looking at the sample ndjson, a &lt;code&gt;@graph-connects users [reads_from, via:id]&lt;/code&gt; entry has &lt;code&gt;users&lt;/code&gt; stored as a &lt;strong&gt;raw string&lt;/strong&gt; in &lt;code&gt;targetId&lt;/code&gt;. Leaving that as-is means it's just a string. Resolving &lt;code&gt;users&lt;/code&gt; into a &lt;strong&gt;rich node carrying column definitions, partition info, and per-column descriptions&lt;/strong&gt; — that's where the resolution power of search takes a real step forward.&lt;/p&gt;

&lt;p&gt;cortex does this in three directions.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. DB Schemas as Nodes in the Same Graph
&lt;/h3&gt;

&lt;p&gt;cpg ingests not just code but cortex's DB schemas in the same build. A &lt;code&gt;@graph-connects users [queries, via:id]&lt;/code&gt; on the code side gets resolved at build time into a &lt;strong&gt;rich Table node&lt;/strong&gt; carrying column definitions, partition metadata, and descriptions (if the same-named stub exists, its internals are replaced while its ID and all inbound edges survive).&lt;/p&gt;

&lt;p&gt;The key point: &lt;strong&gt;table and column descriptions aren't AI-generated annotations attached after the fact — they're pulled directly from the &lt;code&gt;description&lt;/code&gt; fields in the Pulumi schema definitions&lt;/strong&gt;. Here's what that looks like (excerpt from cpg's own table definition):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;productGraphNodesTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;gcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cortex-prod-product-graph-nodes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;datasetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cortex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tableId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;product_graph_nodes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Product Graph nodes — unified knowledge graph of code + DB + docs. &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Auto-generated from JSDoc @graph-* tags&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STRING&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;REQUIRED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unique node ID (graphId:nodeType:filePath:name format)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nodeType&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STRING&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;REQUIRED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Node type — ApiEndpoint, BigQueryTable, Function, Module, Document, etc.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;qualifiedName&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STRING&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fully qualified name — filePath:exportName format&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="p"&gt;]),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both the table-level and column-level descriptions &lt;strong&gt;become the embedding input for semantic search directly from the Pulumi definition&lt;/strong&gt;. The same philosophy as cpg's JSDoc — "write the description at the place the thing is defined" — runs all the way through the DB layer. Fix a Pulumi &lt;code&gt;description&lt;/code&gt; → semantic search improves. Same mechanics as fixing a JSDoc.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Docs Auto-Promoted to Nodes via Directory Convention
&lt;/h3&gt;

&lt;p&gt;Markdown files under &lt;code&gt;docs/&lt;/code&gt; also land in the graph. The mechanism is simple: &lt;strong&gt;the directory structure is conventionalized&lt;/strong&gt; so that which stack and domain each doc belongs to is deterministically resolvable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docs/{category}/{name}.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples from cpg itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docs/product-graph/README.md&lt;/code&gt; → stack: &lt;code&gt;product-graph&lt;/code&gt;, domain: &lt;code&gt;Engineering&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/code-graph/README.md&lt;/code&gt; → stack: &lt;code&gt;code-graph&lt;/code&gt;, domain: &lt;code&gt;Engineering&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/mcp/db-graph/README.md&lt;/code&gt; → stack: &lt;code&gt;mcp-db-graph-server&lt;/code&gt;, domain: &lt;code&gt;Engineering&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each file is ingested as a &lt;strong&gt;Document node&lt;/strong&gt; in the graph, and a &lt;code&gt;documented_by&lt;/code&gt; edge is auto-generated from code nodes whose &lt;code&gt;@graph-stack&lt;/code&gt; matches the doc's stack. Code under &lt;code&gt;apps/graph/product/&lt;/code&gt; all carries &lt;code&gt;@graph-stack product-graph&lt;/code&gt;, so it's automatically linked to &lt;code&gt;docs/product-graph/README.md&lt;/code&gt;. Change code → related docs are already linked.&lt;/p&gt;

&lt;p&gt;This means an AI reviewer can answer "did this code change leave related docs stale?" &lt;strong&gt;in one graph hop&lt;/strong&gt; (that's the source of the &lt;code&gt;[Doc] Critical&lt;/code&gt; comments from Part 1).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Infrastructure Definitions as Nodes
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;@graph-*&lt;/code&gt; tags go on Pulumi code in &lt;code&gt;infra/&lt;/code&gt; too. An example from cortex's own graph infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * @graph-node {CronSchedule}
 * @graph-stack code-graph
 * @graph-domain Engineering
 * @graph-business graph-boundary-daily: runs cross-repository boundary analysis at 7:00 AM JST
 *   daily (auto-detecting API, DB, and Event connections across repos)
 * @graph-connects graph-index-job [triggers] trigger Cloud Run Job
 */&lt;/span&gt;
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;gcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudscheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-graph-boundary-schedule`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This becomes a &lt;strong&gt;CronSchedule node&lt;/strong&gt; in the graph, connected to the target CloudRunJob node by a &lt;code&gt;triggers&lt;/code&gt; edge. The Pulumi definition is itself a graph entry point — "&lt;strong&gt;what code runs in this cron?&lt;/strong&gt;" is now answerable by graph traversal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Result: Four Layers on One Graph
&lt;/h3&gt;

&lt;p&gt;Adding the three together, the node types in the graph look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node type&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Function / Class / Method&lt;/td&gt;
&lt;td&gt;Code (JSDoc)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ApiEndpoint / Page&lt;/td&gt;
&lt;td&gt;Code (JSDoc &lt;code&gt;@graph-node&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BigQueryTable / FirestoreCollection (stub)&lt;/td&gt;
&lt;td&gt;Code &lt;code&gt;@graph-connects&lt;/code&gt; targets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Table / Column / Schema&lt;/strong&gt; (rich)&lt;/td&gt;
&lt;td&gt;Schema files defined in Pulumi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Directory parser over &lt;code&gt;docs/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CronSchedule / PubSubTopic / CloudRunService&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;infra/&lt;/code&gt; JSDoc&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Edge types correspondingly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Edge type&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;calls / queries / reads_from / writes_to / publishes / triggers&lt;/td&gt;
&lt;td&gt;code → other nodes (&lt;code&gt;@graph-connects&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;documented_by&lt;/td&gt;
&lt;td&gt;code → Document (auto-generated on stack match)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HAS_TABLE / HAS_COLUMN&lt;/td&gt;
&lt;td&gt;Schema → Table → Column (DB side)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;shares_topic&lt;/td&gt;
&lt;td&gt;Between boundary nodes sharing a topic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Code ↔ DB ↔ docs ↔ infra&lt;/strong&gt; — all reachable in one hop on the same graph. This is what "Product Graph" means: cortex's unified knowledge graph.&lt;/p&gt;

&lt;p&gt;Here's an actual visualization of a slice of cpg itself. Starting from &lt;code&gt;generateEmbeddings&lt;/code&gt; (code), you can see &lt;code&gt;cortex.product_graph_nodes&lt;/code&gt; (BigQueryTable) with its columns, the Pulumi table definition resource, &lt;code&gt;docs/product-graph/README.md&lt;/code&gt;, external services like Vertex AI, and a separate layer's &lt;code&gt;graph-boundary-daily&lt;/code&gt; (CronSchedule) — &lt;strong&gt;all connected by edges on the same node set&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnq1fcx6e465huuzyr7h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnq1fcx6e465huuzyr7h.png" alt="Product Graph — a knowledge graph with four layers on the same node set" width="800" height="575"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the Sample Stops
&lt;/h3&gt;

&lt;p&gt;graph-jsdoc-extractor &lt;strong&gt;intentionally leaves out&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resolving &lt;code&gt;@graph-connects&lt;/code&gt; targets to real node IDs&lt;/strong&gt; (cortex uses a seven-stage resolver; the rules are project-specific)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same-name merging&lt;/strong&gt; (cortex promotes DB-schema-side rich nodes to replace stubs; the merge source is project-specific)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The docs directory convention parser&lt;/strong&gt; (cortex's &lt;code&gt;docs/{category}/{name}.md&lt;/code&gt; convention is cortex-specific)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding generation&lt;/strong&gt; (Vertex AI setup is up to you)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are parts where &lt;strong&gt;the right answer differs per project&lt;/strong&gt; — naming conventions, where docs live, which embedding model to use, when to promote a stub to a rich node. Baking one answer into the sample library would make it harder to use, not easier. The sample draws the line at JSDoc → graph structure, and this article's job is "here's how we did it in cortex — translate it to your project's context."&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Tool Design and the Runbook Pattern
&lt;/h2&gt;

&lt;p&gt;The graph is now assembled. Next: &lt;strong&gt;how AI uses it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;cpg runs as an MCP server (&lt;code&gt;cortex-product-graph&lt;/code&gt;). From the AI's side, three tools are visible, applying the &lt;strong&gt;three-layer tool design&lt;/strong&gt; (search / detail / traverse) from &lt;a href="https://dev.to/ryantsuji/graph-rag-isnt-a-one-shot-anymore-the-case-for-agentic-graph-rag-mcps-1dj5"&gt;the Agentic Graph RAG MCP post&lt;/a&gt; directly to cpg:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_product_graph_nodes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find entry points (vector search + name search)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_product_graph_node_detail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deterministically fetch detail by ID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trace_product_graph_connections&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BFS subgraph traversal (&lt;code&gt;via_filter&lt;/code&gt; for parameter-level tracking)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three layers only shows you what's &lt;em&gt;in&lt;/em&gt; the graph. For jumping from graph nodes to the actual data they point to, &lt;strong&gt;supplementary tools live in the same MCP&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Supplementary tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pass a node's &lt;code&gt;path&lt;/code&gt; property directly to fetch source (Function / Class / Method / ApiEndpoint / Document — any code-origin node carries &lt;code&gt;path&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;grep_code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pattern search across the repository&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_blame&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Last author, commit, and timestamp per line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;query_product_graph_bq&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Direct SQL against BigQuery. Find a BQTable node in the graph, then jump to its live data (executed via user OAuth, so BQ IAM applies as-is)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;read_firestore&lt;/code&gt; / &lt;code&gt;write_firestore&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Read/write Firestore collections. Find a FirestoreCollection node in the graph, then go to the live documents (Firestore access follows the same user / environment permission boundary; cpg provides the entry point, not a bypass around IAM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;list_product_graph_stacks&lt;/code&gt; / &lt;code&gt;list_product_graph_domains&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Lists all stack / domain names present in the graph; useful for orienting before a search&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In other words, cpg's MCP is &lt;strong&gt;a two-tier design: the three-layer structure for graph traversal + supplementary tools for descending into live data (source code / BQ / Firestore)&lt;/strong&gt;. The AI can do "search by meaning → traverse by structure → pull live data" &lt;strong&gt;entirely within one MCP server&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Runbook Pattern — Return Values Contain the Next Action
&lt;/h3&gt;

&lt;p&gt;Every MCP response ends with a &lt;strong&gt;"related nodes (next action candidates)" block&lt;/strong&gt;. For example, after a search returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3 nodes found:
- apps/generator/kpi/src/kpi-calculator.ts:calculateBugCount (Function)
- backlog_no_embedding.kpi_bug_rate_per_100pt (BigQueryTable)
- /kpi/bugs (ApiEndpoint)

## Related nodes (next action candidates)

### 🛠 Code (1)
- apps/generator/kpi/src/kpi-calculator.ts:calculateBugCount
  → `get_product_graph_node_detail("apps/generator/kpi/src/kpi-calculator.ts:calculateBugCount")`

### 🗄 DB tables (1)
- backlog_no_embedding.kpi_bug_rate_per_100pt
  → `trace_product_graph_connections(start_node: "backlog_no_embedding.kpi_bug_rate_per_100pt", direction: "backward")`

### 🌐 API (1)
- /kpi/bugs
  → `get_product_graph_node_detail("/kpi/bugs")`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Copy-pasteable tool calls are lined up by node type, showing exactly what to call next.&lt;/strong&gt; The AI gets new options on every call, so it never has to figure out "what should I do now?"&lt;/p&gt;

&lt;p&gt;Here's the AI ↔ MCP loop in diagram form. The MCP bundles next action candidates into every search response; the AI picks one and makes the next call, repeating:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3dzmz1icp2jt3iwasab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3dzmz1icp2jt3iwasab.png" alt="Runbook pattern — tool return values contain the next tool call to make" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;usecase&lt;/code&gt; Parameter — Switching the Runbook
&lt;/h3&gt;

&lt;p&gt;Every tool accepts a &lt;strong&gt;&lt;code&gt;usecase&lt;/code&gt; parameter&lt;/strong&gt; where the AI declares what kind of investigation it's doing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;usecase&lt;/th&gt;
&lt;th&gt;Strategy (summary of what cpg optimizes for)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;general&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Basic investigation with unknown entry point. Default.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;design&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Understanding existing feature structure. Read business / connections via &lt;code&gt;get_product_graph_node_detail&lt;/code&gt;. Deep trace is unnecessary; Document nodes take priority.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;impact&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Trace upstream and downstream impact deeply. Hit &lt;code&gt;trace_product_graph_connections&lt;/code&gt; with &lt;code&gt;direction=both&lt;/code&gt; / &lt;code&gt;max_depth=5&lt;/code&gt;. Code + DB + infra + schedules are all on the same graph, so one traversal covers a wide area.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test-create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Test design. Fetch detail to read parameters and connected DB / called functions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Compare existing tests against implementation coverage. Cross-check branch structure of target Function / Method against test case count.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;code-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check impact of changes and detect &lt;code&gt;@graph-business&lt;/code&gt; violations. Trace impact → detail to check business / source.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bug&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deep trace from error origin. &lt;code&gt;direction=both&lt;/code&gt; / &lt;code&gt;max_depth=5&lt;/code&gt; for upstream callers + downstream data flow.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The same &lt;code&gt;search_product_graph_nodes&lt;/code&gt; call with &lt;code&gt;usecase: "code-review"&lt;/code&gt; returns next action candidates optimized for "verify the change's impact first." With &lt;code&gt;usecase: "bug"&lt;/code&gt; it returns candidates optimized for "trace deep from error origin + fetch logs." The Runbook switches to match the declared intent.&lt;/p&gt;

&lt;p&gt;This matters because &lt;strong&gt;having the AI declare "what kind of investigation I'm doing"&lt;/strong&gt; yields different angles from the same graph. Auto Review internally fires with &lt;code&gt;code-review&lt;/code&gt;; Alert-Fix fires with &lt;code&gt;bug&lt;/code&gt; — the flywheel elements from Part 1 each run a different Runbook.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLAUDE.md Convention — Forcing AI to Always Hit cpg First
&lt;/h3&gt;

&lt;p&gt;Throughout this post I've said "the AI uses cpg," but AI doesn't &lt;strong&gt;spontaneously choose&lt;/strong&gt; cpg. Claude Code defaults to grep / glob / file read as its first instinct. To flip that, the root CLAUDE.md in cortex opens with:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  Product Graph MCP (cortex-product-graph)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;This is the single most important asset in this repository.&lt;/strong&gt; cortex-product-graph MCP indexes all code, DB schemas, docs, and infra into a unified knowledge graph with business context. It knows everything about this repository.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always query Product Graph MCP first&lt;/strong&gt; before grep/glob/file reads. It returns richer, contextualized results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If Product Graph MCP is unavailable&lt;/strong&gt; (auth expired, server down) and you are NOT in autonomous/auto mode, &lt;strong&gt;stop all work immediately&lt;/strong&gt; and ask the user to authenticate. Do not proceed with degraded grep-only investigation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two things matter here. First, the explicit ordering — "cpg first, grep only as fallback." Second, &lt;strong&gt;fallback to grep is explicitly forbidden if cpg is unavailable&lt;/strong&gt;. Without that second clause, the AI happily degrades to "cpg seems down, I'll just grep" and proceeds with stale context and wrong assumptions. With it, cpg unavailability is a hard stop, not a graceful degradation.&lt;/p&gt;

&lt;p&gt;One clause in CLAUDE.md, and Claude Code's first move on any code investigation is pinned to cpg. Article writing, Auto Review, Alert-Fix — all follow the same convention, so the entry point is always unified.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Live Example — Investigating cpg with cpg
&lt;/h2&gt;

&lt;p&gt;Enough abstraction. Let me walk through a real cpg query: &lt;strong&gt;using cpg to investigate cpg's own builder core&lt;/strong&gt; — the meta-example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Semantic search for "the code that extracts graph source data from code annotations"
&lt;/h3&gt;

&lt;p&gt;No function name assumed. Just the intent in plain language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search_product_graph_nodes(
  query: "code that extracts graph source data from annotations written in code",
  search_mode: "semantic",
  usecase: "design"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Top 5 results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- apps/graph/product/src/parsers/jsdoc-parser.ts:applyGraphTag (Function)
- apps/graph/product/src/parsers/jsdoc-parser.ts:extractTagsFromNode (Function)
- packages/eslint-plugin-graph/src/utils/jsdoc-utils.ts:extractGraphTags (Function)
- apps/graph/product/src/parsers/jsdoc-parser.ts:parseJSDocExports (Function)
- packages/eslint-plugin-graph/src/utils/jsdoc-utils.ts:getGraphTagValue (Function)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query contained neither "JSDoc" nor "&lt;code&gt;@graph-*&lt;/code&gt;" nor "parser" — yet the intent found the right nodes &lt;strong&gt;via the &lt;code&gt;@graph-business&lt;/code&gt; embedding&lt;/strong&gt;. grep cannot do this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Trace downstream from that node (&lt;code&gt;usecase: "design"&lt;/code&gt; prioritizes Documents)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trace_product_graph_connections(
  start_node: "apps/graph/product/src/parsers/jsdoc-parser.ts:parseJSDocExports",
  direction: "forward",
  usecase: "design"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edges returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- parseJSDocExports --calls--&amp;gt; extractDeclarationsFromFile
- parseJSDocExports --calls--&amp;gt; extractTagsFromNode
- parseJSDocExports --reads_from[via:filePath]--&amp;gt; filesystem
- parseJSDocExports --documented_by--&amp;gt; docs/product-graph/README.md (Document)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last one — &lt;code&gt;documented_by&lt;/code&gt; — is the point: &lt;strong&gt;the edge from code to the Document node was auto-generated&lt;/strong&gt;. Following it with &lt;code&gt;read_file&lt;/code&gt; retrieves &lt;code&gt;docs/product-graph/README.md&lt;/code&gt; — and with it, &lt;strong&gt;the background, design rationale, and tag specification for this implementation&lt;/strong&gt;, all in one hop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The meta-structure — this article itself is written with cpg
&lt;/h3&gt;

&lt;p&gt;This article was drafted by Claude Code, not by me — I provided direction and review. That Claude Code has cpg MCP connected, so every time I said "show a real example from cpg's own code" or "use a cpg-related infra example," Claude queried cpg to pull actual function names, JSDoc, Pulumi definitions, and docs structure, then embedded them in the text.&lt;/p&gt;

&lt;p&gt;In other words: the &lt;strong&gt;&lt;code&gt;generateEmbeddings&lt;/code&gt; JSDoc, the Pulumi &lt;code&gt;productGraphNodesTable&lt;/code&gt; description, the &lt;code&gt;graph-boundary-daily&lt;/code&gt; cron annotation, the auto-link to &lt;code&gt;docs/product-graph/README.md&lt;/code&gt;&lt;/strong&gt; — none of these came from my memory. &lt;strong&gt;Claude queried cpg and found the real artifacts&lt;/strong&gt;. My role is only the review judgment: "this is right / this is wrong."&lt;/p&gt;

&lt;p&gt;This is the pattern repeating across all of cortex. &lt;strong&gt;Humans set the direction; AI uses cpg to verify and generate implementations / text / reviews&lt;/strong&gt;. Part 1's ③ Auto Review and ④ Alert-Fix run on the same structure. Article writing isn't a special case — as long as cpg exists, AI-driven work always takes this shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed / Bridge to Part 3
&lt;/h2&gt;

&lt;p&gt;That covers the inside of cpg. A closing summary of how it affects cortex as a whole:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. I stopped running grep&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without knowing file names or symbol names, I can get the relevant code back by just describing what I want to do. The combination of 120+ apps and a team of one works because of this, more than anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Auto Review produces context-grounded comments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;[Graph]&lt;/code&gt; / &lt;code&gt;[Impact]&lt;/code&gt; / &lt;code&gt;[Doc]&lt;/code&gt; / &lt;code&gt;[Security]&lt;/code&gt; level comments Part 1's ③ Auto Review produces all stand on cpg. The substance is &lt;strong&gt;review carried out with the entire codebase as context&lt;/strong&gt; — that's the real benefit of the cpg integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Alert-Fix can trace from error origin to root cause&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Part 1's ④ Alert-Fix can hop from a Grafana alert → code → dependent tables → related docs in one graph traversal because cpg exists. It fires with &lt;code&gt;usecase: "bug"&lt;/code&gt; and takes the shortest path from error to root cause.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The static-analysis code graph is working somewhere else&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I said "we abandoned code inference" at the top, but that was specifically for cortex itself. For the external-facing production repositories (the core of the business), a different approach supplies context, and static analysis continues to run there. More on that in a separate post.&lt;/p&gt;

&lt;p&gt;Most AI coding setups try to make the AI better at reading an &lt;em&gt;unchanged&lt;/em&gt; repository. cpg takes the opposite approach: &lt;strong&gt;change the repository's information structure so AI has a first-class semantic map to read&lt;/strong&gt;. That's the line between "another GraphRAG" and what cpg actually is.&lt;/p&gt;

&lt;p&gt;In that sense, Product Graph is literally a knowledge graph of the AI, by the AI, for the AI: generated alongside AI-written code, maintained through AI review, and consumed by AI agents as their primary map of the product.&lt;/p&gt;

&lt;p&gt;Coming up in &lt;strong&gt;Part 3&lt;/strong&gt;: the full pipeline of &lt;strong&gt;automated PR review&lt;/strong&gt; built on top of cpg — from GitHub webhook ingestion through AI review / automated fix / automated merge / parallel deploy. What happens when Auto Review fires with &lt;code&gt;usecase: "code-review"&lt;/code&gt;, how &lt;code&gt;[Graph] Critical&lt;/code&gt; comments are generated, and the worktree mechanism that lets AI apply fixes and push back.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>graphrag</category>
      <category>jsdoc</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Minimal post (test fixture)</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Sun, 17 May 2026 16:48:43 +0000</pubDate>
      <link>https://forem.com/ryantsuji/minimal-post-test-fixture-1pk2</link>
      <guid>https://forem.com/ryantsuji/minimal-post-test-fixture-1pk2</guid>
      <description>&lt;p&gt;Minimal test fixture used by &lt;code&gt;$slug.test.tsx&lt;/code&gt;. No headings, no tags — covers the&lt;br&gt;
null branches of TOC rendering and tag-list rendering in &lt;code&gt;routes/posts/$slug.tsx&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Slugs prefixed with &lt;code&gt;_&lt;/code&gt; are excluded from &lt;code&gt;/posts&lt;/code&gt; listing (production publishing&lt;br&gt;
surface) but remain reachable via direct &lt;code&gt;getRenderedPost(slug, lang)&lt;/code&gt; (the&lt;br&gt;
&lt;code&gt;virtual:rendered-posts&lt;/code&gt; lookup that backs &lt;code&gt;/posts/$slug&lt;/code&gt;) so test fixtures can&lt;br&gt;
be SSR'd without polluting the index.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building a Real AI Harness: Auto-Reviewed PRs, Self-Healing Ops, and Non-Engineer Contributors (Series Intro)</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Tue, 12 May 2026 16:34:39 +0000</pubDate>
      <link>https://forem.com/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa</link>
      <guid>https://forem.com/ryantsuji/building-a-real-ai-harness-auto-reviewed-prs-self-healing-ops-and-non-engineer-contributors-3lfa</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;p&gt;In my previous posts I've introduced &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the full picture of our 17 internal MCP servers&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;an MCP server that searches 991 internal tables in natural language&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda"&gt;a custom Graph RAG for measuring initiative impact&lt;/a&gt;, and &lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;the Sandbox MCP that lets non-engineers publish AI-built apps safely&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All of those run on top of an internal AI development platform we call &lt;strong&gt;cortex&lt;/strong&gt;. This post is the first in a series about cortex itself — the platform, the design choices, and the operational experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Series Index
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Key scene&lt;/th&gt;
&lt;th&gt;Article&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Series intro: cortex's harness&lt;/td&gt;
&lt;td&gt;PRs auto-merge / incidents self-heal before you notice&lt;/td&gt;
&lt;td&gt;this post ← you are here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Product Graph (cpg)&lt;/td&gt;
&lt;td&gt;Code, docs, DB, infra unified into one graph&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;cortex-product-graph&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI PR review&lt;/td&gt;
&lt;td&gt;webhook → AI review → auto-fix → squash merge&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Alert-Fix + observability + auto-added guardrails&lt;/td&gt;
&lt;td&gt;Alert → AI investigates → fix PR + new lint/type gate → auto redeploy + recurrence blocked&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Scaling the harness from cortex to toC services&lt;/td&gt;
&lt;td&gt;Non-engineer contributions in practice + scaling cortex's harness to the whole product org&lt;/td&gt;
&lt;td&gt;coming&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Two Scenes, Up Front
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scene 1: PRs merge themselves
&lt;/h3&gt;

&lt;p&gt;Monday morning. An engineer implements a feature locally, pushes a branch, opens a PR.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A few minutes later, the AI reviewer comes back with REQUEST_CHANGES. Multiple comments:

&lt;ul&gt;
&lt;li&gt;"This data formatting duplicates &lt;code&gt;formatRow()&lt;/code&gt; in the shared package. Please consolidate."&lt;/li&gt;
&lt;li&gt;"You changed an API response type, but the related docs (&lt;code&gt;docs/api/...&lt;/code&gt;) still describe the old shape."&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;A separate AI agent spawns a worktree, applies the fixes, pushes a follow-up commit&lt;/li&gt;

&lt;li&gt;Re-review comes back as APPROVE&lt;/li&gt;

&lt;li&gt;Auto squash-merge&lt;/li&gt;

&lt;li&gt;GitHub Actions detects only the changed stacks and deploys them to Cloud Run / Cloudflare Pages&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No human touched any of this&lt;/strong&gt;. The engineer refreshes the PR tab and notices it's already merged.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scene 2: Incidents fix themselves before you notice
&lt;/h3&gt;

&lt;p&gt;7 AM. A Grafana alert fires: "BQ pipeline failed 3 times in a row."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AI receives the webhook, fetches the error logs from Loki via the &lt;strong&gt;Grafana MCP&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Walks the &lt;strong&gt;Product Graph&lt;/strong&gt; (implementation name: &lt;code&gt;cortex-product-graph&lt;/code&gt; — a unified knowledge graph of the codebase, docs, DB schemas, and infrastructure definitions; covered later in this post and in Part 2) to trace the pipeline's code, dependent tables, and related docs, identifying the root cause&lt;/li&gt;
&lt;li&gt;Opens a fix PR&lt;/li&gt;
&lt;li&gt;AI reviewer APPROVE → auto squash-merge → automatic redeploy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the time the engineer logs in at 9 AM, Slack already shows: "pipeline patched." The only incidents engineers personally handle are the ones AI genuinely can't crack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy2w9hetr3l8xyb8tbs2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy2w9hetr3l8xyb8tbs2.png" alt="Two automation loops" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's behind both scenes is the dev environment described in the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Industry Context — "Harness Engineering"
&lt;/h2&gt;

&lt;p&gt;Before I get to cortex, one paragraph of context. Over the past six months, &lt;strong&gt;the practice of building proper foundations for AI agents in production&lt;/strong&gt; has crystallized into a recognized industry trend.&lt;/p&gt;

&lt;p&gt;"Harness" itself isn't a new word. In AI specifically, it traces back to &lt;strong&gt;EleutherAI's &lt;a href="https://github.com/EleutherAI/lm-evaluation-harness" rel="noopener noreferrer"&gt;lm-evaluation-harness&lt;/a&gt; (2020)&lt;/strong&gt; — the LLM evaluation framework that put the term in active use. What changed in the past six months is its elevation into an engineering discipline for &lt;strong&gt;LLM agents in production&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Feb 2026&lt;/strong&gt;: OpenAI published &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;"Harness engineering: leveraging Codex in an agent-first world"&lt;/a&gt;, describing how a small internal team led by Codex shipped &lt;strong&gt;1 million lines in 5 months&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A few days later, Mitchell Hashimoto (HashiCorp co-founder, Terraform creator) distilled it into the formula &lt;code&gt;Agent = Model + Harness&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;April 2026&lt;/strong&gt;: Martin Fowler (author of &lt;em&gt;Refactoring&lt;/em&gt;, ThoughtWorks Chief Scientist) published &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;"Harness engineering for coding agent users"&lt;/a&gt;, establishing the &lt;strong&gt;Guides (proactive controls) / Sensors (reactive controls)&lt;/strong&gt; framing&lt;/li&gt;
&lt;li&gt;Same month: Anthropic and Cursor each published their own harness write-ups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The catchphrase that's gone viral: &lt;strong&gt;"2025 was the year of agents. 2026 is the year of harnesses."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The framing is: &lt;strong&gt;the model itself is rapidly commoditizing&lt;/strong&gt; (the gap between Claude / GPT / Gemini is narrowing from the user side). Where you actually get differentiation is &lt;strong&gt;how you design the harness — the foundation that lets AI run in production&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;cortex is most cleanly read as &lt;strong&gt;a real attempt to build that "harness" inside a real company&lt;/strong&gt;. In this post I'll organize cortex using Fowler's Guides / Sensors framing.&lt;/p&gt;

&lt;p&gt;From here, I'll show &lt;strong&gt;how the "harness beats model" thesis takes concrete shape on cortex&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Builds the Code
&lt;/h2&gt;

&lt;p&gt;For the first few months, &lt;strong&gt;I built 100% of cortex by myself&lt;/strong&gt;. The accurate framing isn't "without a harness, others can't safely PR" but rather "&lt;strong&gt;without a harness, no one — including me with extra hands — could ride this thing&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;Even back then, between &lt;a href="https://dev.to/ryantsuji/how-we-built-an-automated-meeting-intelligence-system-with-google-meet-slack-and-rag-42ln"&gt;our Google Meet recording pipeline&lt;/a&gt; (Japanese), about half of the &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;17 MCP servers&lt;/a&gt;, and a long tail of unpublished features, &lt;strong&gt;roughly 50 loosely-coupled applications were already running&lt;/strong&gt;. Each one had its purpose, background, and data flow documented carefully. But the volume was such that &lt;strong&gt;even with AI in the loop, you couldn't realistically have it read all the relevant docs and absorb the whole picture for any given change&lt;/strong&gt;. The codebase had outgrown what a person — or an AI given pieces — could hold in their head at once.&lt;/p&gt;

&lt;p&gt;Recently, with the harness in place, &lt;strong&gt;non-engineers&lt;/strong&gt; (business-side managers, PMOs, etc.) have started shipping PRs to cortex too. As of writing, the cumulative commit ratio is &lt;strong&gt;~91% me, ~9% other recent contributors&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you imagine non-engineers opening PRs against a production repo, "can quality really hold?" is the obvious question. In cortex, the answer is yes, because &lt;strong&gt;AI review and automation own the quality gates&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PRs missing annotations, tests, or lint cleanliness get REQUEST_CHANGES from the AI reviewer&lt;/li&gt;
&lt;li&gt;A separate AI agent applies the fixes&lt;/li&gt;
&lt;li&gt;Until everything is satisfied, nothing merges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So whoever writes a PR — engineer or not — &lt;strong&gt;at the moment it merges, the same quality bar is met&lt;/strong&gt;. The key point: it's not "you can write freely," it's "&lt;strong&gt;you can write inside rails that don't let you derail&lt;/strong&gt;." The author's job stops at "communicating the intent precisely"; the harness owns code correctness.&lt;/p&gt;

&lt;p&gt;The shift is from "&lt;strong&gt;X could write that because they're X&lt;/strong&gt;" to "&lt;strong&gt;X can write that because of cortex&lt;/strong&gt;." That property only emerges once the harness is built — and it's the core of cortex's design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Running
&lt;/h2&gt;

&lt;p&gt;cortex consists of microservices, jobs, MCP servers, web frontends, Cloudflare Workers, and so on. As of writing, there are &lt;strong&gt;123 apps&lt;/strong&gt;. The features I've already covered in past posts are each composed of multiple apps — but even adding them up by feature, &lt;strong&gt;only about 10% of cortex has been written about&lt;/strong&gt;. The remaining 90% hasn't appeared in a post yet. A few examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A unified product UX measurement web app&lt;/strong&gt; — UX metrics, screen analysis, funnels, and error analysis in one place&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A dev-org portal web app&lt;/strong&gt; — KPIs (bug rate, etc.), per-member GitHub Activity, QA evaluation results, plus an AI chat that answers natural-language questions about KPIs via Agentic RAG&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A family of Slack bots&lt;/strong&gt; for operational support:

&lt;ul&gt;
&lt;li&gt;A config bot that lets you manage job configurations (DBs, attendance SaaS, Google Drive, etc.) directly from Slack&lt;/li&gt;
&lt;li&gt;An accounting-assist bot that takes invoice OCR and drafts payment requests / expense filings in our accounting SaaS&lt;/li&gt;
&lt;li&gt;In-channel knowledge search, issue/request management, meeting creation; a BigQuery cross-table RAG bot; a Google Drive cross-corpus RAG bot&lt;/li&gt;
&lt;li&gt;A marketing bot that returns insights (trend, creative analysis) from BigQuery marketing data&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;An APM auto-analysis agent&lt;/strong&gt; that runs daily on monitoring-SaaS APM data, detects performance issues, and opens tickets in our issue-tracking SaaS&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;An AI-bot auditor bot&lt;/strong&gt; that runs E2E tests against the Slack bots above and detects spec drift&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;…and so on. &lt;strong&gt;Each will get its own dedicated post later in the series.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scale at a glance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;apps (microservices, jobs, MCP servers, web, etc.)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;123&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;packages (shared libraries)&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP servers&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pulumi stacks&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript (implementation)&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;630K lines&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;560K lines&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markdown documentation&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;110K lines / 389 files&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;5 months&lt;/strong&gt; (intensive development: ~4 months)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merged PRs&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;790&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The 4-Element Flywheel — cortex's Harness
&lt;/h2&gt;

&lt;p&gt;What lets "&lt;strong&gt;~4 months of intensive dev, mostly solo&lt;/strong&gt;" coexist with "&lt;strong&gt;non-engineers shipping into the same repo&lt;/strong&gt;" is a harness design that &lt;strong&gt;delegates quality to AI and automation across every layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;cortex's harness is structured as a &lt;strong&gt;flywheel&lt;/strong&gt; of 4 elements, mapped to Fowler's &lt;strong&gt;Guides (proactive) / Sensors (reactive)&lt;/strong&gt; split, that &lt;strong&gt;mutually reinforce one another&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffblxd05aurtwu3bydb16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffblxd05aurtwu3bydb16.png" alt="cortex AI Harness Flywheel" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ① Product Graph (Guides — supplying the right context)
&lt;/h3&gt;

&lt;p&gt;All of cortex — &lt;strong&gt;code, documentation, DB schemas, infrastructure definitions&lt;/strong&gt; — is indexed in real time as a single unified graph. It's queryable via MCP through &lt;strong&gt;semantic search&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;"Where is the code that calculates this KPI?" → "Which BQ tables does that code touch?" → "What are those tables' column definitions?" → "What docs are related?" — all of these can be answered from a single query traversal. That graph becomes the context source for everything the AI does.&lt;/p&gt;

&lt;p&gt;This is the foundation that &lt;strong&gt;"structurally reduces how often the AI gets confused."&lt;/strong&gt; Where grep tells you "where the string appears," the Product Graph tells you "&lt;strong&gt;what is connected, why, and how&lt;/strong&gt;." Implementation details come in Part 2.&lt;/p&gt;

&lt;h3&gt;
  
  
  ② Lint / Quality Gates (Guides — physically blocking deviations)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;eslint-disable&lt;/code&gt; / &lt;code&gt;oxlint-disable&lt;/code&gt; are forbidden anywhere in the repo. In hand-written code, occurrences of &lt;code&gt;: any&lt;/code&gt; / &lt;code&gt;as any&lt;/code&gt; / TODO / FIXME are &lt;strong&gt;0&lt;/strong&gt; (excluding generated files and unavoidable external-library cases). &lt;strong&gt;Type checking&lt;/strong&gt; (using &lt;strong&gt;tsgo&lt;/strong&gt; — Microsoft's Go port of the TypeScript compiler, ~10× faster than &lt;code&gt;tsc&lt;/code&gt;; we use it to keep CI time down) runs on the entire codebase in CI.&lt;/p&gt;

&lt;p&gt;On top of that, test coverage is enforced at &lt;strong&gt;≥90% for statements / branches / functions / lines&lt;/strong&gt;. &lt;strong&gt;Lowering the threshold to pass is forbidden&lt;/strong&gt; — you write tests instead.&lt;/p&gt;

&lt;p&gt;With every escape hatch sealed, &lt;strong&gt;even when the AI writes wrong code, it doesn't merge&lt;/strong&gt;. This is also what stabilizes AI review judgments downstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  ③ Auto Review (Sensors — auto-fixing until the bar is met)
&lt;/h3&gt;

&lt;p&gt;Scene 1 above is exactly this. The implementation-side note: &lt;strong&gt;AI review here isn't "lint with extra steps" — every comment is grounded in Product-Graph traversal of the actual impact&lt;/strong&gt;. That's where it earns its keep. To give you a feel, comments that actually fire fall into categories like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;[Graph] Critical&lt;/strong&gt; — missing annotation that breaks an edge in the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Impact] Critical&lt;/strong&gt; — a BQ MERGE statement referencing a column not present in the existing target table; would fail in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Doc] Critical&lt;/strong&gt; — code change that left related docs stale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[Security] Minor&lt;/strong&gt; — &lt;code&gt;execSync&lt;/code&gt; doing string interpolation on an env var, opening a command injection vector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you might mentally classify as "AI review" — surface-level — isn't this. &lt;strong&gt;Comments here are produced with the entire codebase carried as context&lt;/strong&gt;, which is what the Product Graph integration buys you.&lt;/p&gt;

&lt;p&gt;The only PRs that actually need a human are "AI review hits a hard case." Day-to-day PRs go from push to merge without anyone touching them.&lt;/p&gt;

&lt;h3&gt;
  
  
  ④ Alert-Fix (Sensors — re-injecting production anomalies into the loop)
&lt;/h3&gt;

&lt;p&gt;Scene 2 above is exactly this. Starting from a Grafana alert, the AI traces the root cause through Product Graph + Loki + git blame, opens a fix PR, and pushes it through ③ Auto Review until it's auto-merged. &lt;strong&gt;Re-injecting anomalies into the loop&lt;/strong&gt; is the essence of Sensors. Details in a later post.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes It a Flywheel
&lt;/h3&gt;

&lt;p&gt;These 4 elements &lt;strong&gt;mutually reinforce one another&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;① Product Graph exists, so ③ Auto Review can comment with real impact awareness&lt;/li&gt;
&lt;li&gt;② Lint enforces the ground rules, so ③ Auto Review can assume "everything in the codebase meets the bar"&lt;/li&gt;
&lt;li&gt;③ Auto Review exists, so new code lands in ① Product Graph with correct semantic annotations&lt;/li&gt;
&lt;li&gt;④ Alert-Fix's incidents loop back through ③, maintaining the quality bar all the way back to ①&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The harness's effectiveness scales with the size of the codebase&lt;/strong&gt;, not against it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supporting Foundations
&lt;/h3&gt;

&lt;p&gt;Three foundations make the 4 elements possible (covered in detail in Part 4):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tests and coverage&lt;/strong&gt;: ~630K lines of implementation, ~560K lines of tests (&lt;strong&gt;impl : test ≒ 1.13 : 1&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: ~110K lines / 389 files, written &lt;strong&gt;for both humans and AI&lt;/strong&gt;, also ingested as Document nodes in the Product Graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt;: Frontend = Faro, backend = OTel, infrastructure and CI logs all consolidated in Grafana. &lt;strong&gt;The AI sees the same data humans see.&lt;/strong&gt; Gemini API token usage and cost are tracked separately in Prometheus.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technical Foundation
&lt;/h2&gt;

&lt;p&gt;cortex is a &lt;strong&gt;full-TypeScript monorepo&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Applications (&lt;code&gt;apps/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;TypeScript (Hono, TanStack Router, Vite, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared packages (&lt;code&gt;packages/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure (&lt;code&gt;infra/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;TypeScript (&lt;strong&gt;Pulumi&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge (&lt;code&gt;worker/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;TypeScript (Cloudflare Workers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lint plugins&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Doc scripts&lt;/td&gt;
&lt;td&gt;TypeScript (tsx)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Having everything in one language is &lt;strong&gt;a much bigger win when viewed from the AI's side&lt;/strong&gt; than from a human's. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You can feed the AI ASTs and type definitions directly as context&lt;/strong&gt; — no language boundary fragments the picture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refactors don't cross language boundaries&lt;/strong&gt; — one ESLint plugin can inspect and auto-fix &lt;code&gt;apps/&lt;/code&gt;, &lt;code&gt;packages/&lt;/code&gt;, and &lt;code&gt;infra/&lt;/code&gt; together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edges don't break in the Product Graph&lt;/strong&gt; — for example, a Cloud Run service definition (&lt;code&gt;infra/&lt;/code&gt;, TS) connects in a single graph to the Hono route (&lt;code&gt;apps/&lt;/code&gt;, TS) it actually invokes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you ask the AI "what does this change affect?", the reason it can hop &lt;code&gt;infra → apps → packages&lt;/code&gt; and answer in one round-trip is that all of this is one language.&lt;/p&gt;

&lt;p&gt;Build is parallelized via &lt;a href="https://turbo.build/" rel="noopener noreferrer"&gt;Turborepo&lt;/a&gt; and &lt;a href="https://pnpm.io/" rel="noopener noreferrer"&gt;pnpm workspaces&lt;/a&gt;. Deploys go through GitHub Actions, which &lt;strong&gt;detects only changed stacks&lt;/strong&gt; and applies them in parallel via Pulumi.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers (snapshot at time of writing)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2t3af24pghx2hawd684.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2t3af24pghx2hawd684.png" alt="Scale" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;5 months&lt;/strong&gt; (intensive development: ~4 months)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commits&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;4,000&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merged PRs&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;790&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;% of commits authored by me&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;91%&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;apps&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;123&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;packages&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP servers&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pulumi stacks&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript (implementation)&lt;/td&gt;
&lt;td&gt;~630K lines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript (tests)&lt;/td&gt;
&lt;td&gt;~560K lines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markdown documentation&lt;/td&gt;
&lt;td&gt;~&lt;strong&gt;110K lines / 389 files&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;as any&lt;/code&gt; / TODO / unjustified lint-disable in hand-written code&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (excluding generated files / unavoidable external-library cases)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coverage gate&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;90%&lt;/strong&gt; (statements / branches / functions / lines)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The PR-flow Switch That Multiplied Throughput
&lt;/h3&gt;

&lt;p&gt;Up until April, &lt;strong&gt;I was AI-assisted reviewing every change carefully on my own machine and then committing directly to main&lt;/strong&gt;. The review bar was unchanged, but throughput was bottlenecked on my hands.&lt;/p&gt;

&lt;p&gt;In April, switching to &lt;strong&gt;fine-grained, PR-based operation&lt;/strong&gt; (auto review → auto fix → auto merge) dramatically changed the per-month merged-PR count:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;Merged PRs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2026-02&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026-03&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026-04&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;518&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026-05 (through the 10th)&lt;/td&gt;
&lt;td&gt;235&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A &lt;strong&gt;~22× jump&lt;/strong&gt; between March and April. Total commits actually went down (because committing directly to main was replaced by going through PRs), so this isn't "I wrote more code." This is "&lt;strong&gt;the manual review step got replaced by the harness, and the throughput ceiling moved&lt;/strong&gt;." &lt;strong&gt;The 22× is exactly the moment a human reviewer was swapped for Auto Review&lt;/strong&gt; — clean evidence of the flywheel property where the harness's effectiveness scales with codebase size.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Required for These Numbers to Hold
&lt;/h3&gt;

&lt;p&gt;These numbers are &lt;strong&gt;not explained by "we use AI" alone&lt;/strong&gt;. The prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full TypeScript monorepo&lt;/strong&gt; — code, tests, infrastructure, scripts all under one static-analysis system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composable Architecture&lt;/strong&gt; — &lt;code&gt;packages/&lt;/code&gt; holds reusable parts; &lt;code&gt;apps/&lt;/code&gt; compose them. Direct imports between &lt;code&gt;apps/&lt;/code&gt; are forbidden — everything routes through &lt;code&gt;packages/&lt;/code&gt;. This is what guarantees components don't interfere with each other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict quality gates&lt;/strong&gt; — lint / coverage / annotations are run "no lowering, no working around"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified graph&lt;/strong&gt; — code, docs, DB, infrastructure on a single graph as the foundation that lets the AI act with context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto PR review / auto fix / auto merge / auto alert-fix&lt;/strong&gt; — the harness that swaps the rate-limiting manual step for AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified observability&lt;/strong&gt; — humans and AI see the same data (OTel + Faro + Prometheus)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design has to be in place first, and AI runs on top of it. That's what makes both volume and quality possible at the same time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composable Architecture&lt;/strong&gt; in particular is what drives the headcount-of-one production. Because components don't interfere, &lt;strong&gt;multiple Claude Code sessions can run in parallel on different parts of the codebase&lt;/strong&gt;. In practice, I've run up to ~10 sessions in parallel at peak — this multiplies with the harness's effectiveness.&lt;/p&gt;

&lt;p&gt;It's &lt;strong&gt;system design, not magic&lt;/strong&gt;. Each piece will get its own deep-dive in this series.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some Honest Caveats
&lt;/h2&gt;

&lt;p&gt;If you've read this far, it might sound like everything runs perfectly on autopilot. It doesn't. Three things I want to be upfront about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. High code quality doesn't prevent bugs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What the harness protects is &lt;strong&gt;"correctness of the code"&lt;/strong&gt; — not &lt;strong&gt;"correctness of the spec."&lt;/strong&gt; Even when implementation is clean, getting the spec interpretation wrong still ships bugs. AI review can catch "code contradicts the documented spec," but if the spec itself is wrong, the issue sails right through. That part is still a human responsibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The work is split deliberately.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;New pipelines that connect to external APIs, and anything touching secure data, are &lt;strong&gt;handled by engineers&lt;/strong&gt;. Non-engineers mostly work on &lt;strong&gt;modifications to features that already exist&lt;/strong&gt; (peeking at our business-side members' PRs makes it concrete pretty quickly). &lt;strong&gt;"Non-engineers can develop too"&lt;/strong&gt; means &lt;strong&gt;"the harness provides rails they can't derail from, so they can safely modify in maintenance mode"&lt;/strong&gt; — not "anyone can build anything from scratch."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. This level of automation works because it's an internal platform.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, cortex's full-auto deploy works partly because Composable Architecture cleanly separates apps and infrastructure. But honestly, &lt;strong&gt;a big part of it is that this is an internal-only platform&lt;/strong&gt;. If something breaks, only employees are affected, and we can roll back fast. The same approach can't be applied directly to consumer products or systems where downtime is immediately critical (warehouse management, for example). We've started moves to close that gap on the consumer side too, but that's a separate post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Series Roadmap
&lt;/h2&gt;

&lt;p&gt;The series is planned as 6 parts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 1: Series Intro&lt;/strong&gt; (this post)&lt;br&gt;
   The big picture of what cortex is and why it works in "harness" form. The map to the rest of the series.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Part 2: Product Graph — code, docs, DB, infrastructure as one unified graph&lt;/a&gt;&lt;/strong&gt; ★ recommended next&lt;br&gt;
   The implementation side: how the unified graph is built and maintained. What happens when you take the design principles from &lt;a href="https://dev.to/ryantsuji/graph-rag-isnt-a-one-shot-anymore-the-case-for-agentic-graph-rag-mcps-1dj5"&gt;the Agentic Graph RAG MCP post&lt;/a&gt; and apply them to the entire cortex codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 3: AI reviews, fixes, merges, and deploys PRs&lt;/strong&gt;&lt;br&gt;
   GitHub webhook → AI review → on REQUEST_CHANGES, AI fixes via worktree → auto squash merge → changed-stack detection → parallel deploy: the full pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 4: Incidents self-heal, guardrails self-strengthen&lt;/strong&gt;&lt;br&gt;
   Grafana alert → AI investigation (Loki + Product Graph + git blame) → fix PR + new lint/type gate → auto merge → automatic redeploy: the auto alert-fix system. Also covers the full OTel + Faro + Prometheus stack, Gemini cost tracking, and how the quality gates are designed to be "non-loweriable, non-bypassable, and self-growing."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 5: Scaling the harness from cortex to toC services&lt;/strong&gt;&lt;br&gt;
   The first half covers how business members can already open PRs directly to cortex -- and where that breaks (additions to existing pipelines work; new pipelines and architectural changes still need humans in the loop). The second half is the roadmap and the thinking behind scaling cortex's harness across the whole product org (multiple services, multiple infra stacks, multiple teams).&lt;/p&gt;

&lt;p&gt;Each post stands on its own, but &lt;strong&gt;Part 2 (Product Graph) is the foundation for the others&lt;/strong&gt;, so the recommended reading order is Part 1 → Part 2 → any.&lt;/p&gt;

&lt;p&gt;Cadence: Tuesdays or Thursdays, 8–10 AM JST.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Building cortex, what's struck me is that &lt;strong&gt;in an AI-era dev environment, "absorbing everything that comes after the writing" wins over "reducing the burden on the writer"&lt;/strong&gt;. Tests, lint, types, coverage, code review, incident response — instead of "these get in the way, let's reduce them," the choice that worked was "&lt;strong&gt;have the AI do all of them, without compromise&lt;/strong&gt;." The counterintuitive result is that quality and dev speed both go up at the same time.&lt;/p&gt;

&lt;p&gt;And it expands two things — &lt;strong&gt;how much one engineer can ship&lt;/strong&gt;, and &lt;strong&gt;how much non-engineers can participate&lt;/strong&gt; — well beyond what was possible before. That's the texture of the "harness" we've built on top of cortex.&lt;/p&gt;

&lt;p&gt;In subsequent parts, I'll walk through the individual mechanisms that make this work.&lt;/p&gt;

&lt;p&gt;→ Part 2: &lt;a href="https://dev.to/ryantsuji/the-heart-of-the-ai-harness-a-knowledge-graph-of-the-ai-by-the-ai-for-the-ai-series-part-2-4a59-temp-slug-9510240"&gt;Product Graph — code, docs, DB, infrastructure as one unified graph&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>graphrag</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Graph RAG Isn't a One-Shot Anymore — The Case for Agentic Graph RAG MCPs</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Thu, 07 May 2026 09:57:32 +0000</pubDate>
      <link>https://forem.com/ryantsuji/graph-rag-isnt-a-one-shot-anymore-the-case-for-agentic-graph-rag-mcps-1dj5</link>
      <guid>https://forem.com/ryantsuji/graph-rag-isnt-a-one-shot-anymore-the-case-for-agentic-graph-rag-mcps-1dj5</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;p&gt;Over my last few posts, I've introduced internal MCP servers we've been building: &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;DB Graph MCP&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the full picture of our 17 internal MCP servers&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda"&gt;Biz Graph&lt;/a&gt;, and &lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;Sandbox MCP&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;DB Graph is built from ORM parsing. Biz Graph extracts initiatives from meeting slides and uses a hand-designed Week node structure. Sandbox MCP is an app deployment platform. The purposes and implementations are completely different — but as I was writing each piece, I noticed that &lt;strong&gt;the design ideas at the root are the same&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post is about that root. &lt;strong&gt;Agentic Graph RAG&lt;/strong&gt; — a design frame we keep coming back to whenever we build graphs across different domains.&lt;/p&gt;

&lt;p&gt;If you've heard "Graph RAG" before — maybe Microsoft's open-source project — wait a moment. The same words mean different things in &lt;strong&gt;the era when retrieval was assumed to be a single shot&lt;/strong&gt; versus &lt;strong&gt;the era when AI agents are everywhere&lt;/strong&gt;. The optimal design changes completely. This post is about the latter — a new way to think about Graph RAG in a world where Claude Code, Codex, and friends are doing the orchestration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is RAG, Really?
&lt;/h2&gt;

&lt;p&gt;Quick refresher. Skip if this is familiar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG (Retrieval Augmented Generation)&lt;/strong&gt; is the umbrella term for any technique that &lt;strong&gt;retrieves&lt;/strong&gt; related information from external data and mixes it into the prompt before the LLM generates an answer.&lt;/p&gt;

&lt;p&gt;Why was this needed? In the early days of generative AI — late 2022 and through 2023 — we ran into three problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tiny context windows&lt;/strong&gt;: GPT-3.5 had 4K tokens, early GPT-4 had 8K. You couldn't fit your internal docs in there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale model knowledge&lt;/strong&gt;: The model didn't know anything past its training cutoff. It certainly didn't know your internal data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination&lt;/strong&gt;: It would confidently fabricate answers when it didn't know.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The RAG idea was: &lt;strong&gt;every time&lt;/strong&gt; the user asks something, fetch the relevant chunks from external data and feed them in before generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector RAG — The First Practical Answer
&lt;/h2&gt;

&lt;p&gt;The earliest RAG implementation that actually caught on was &lt;strong&gt;Vector RAG&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The recipe is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Split documents into small chunks (say, 500 tokens each)&lt;/li&gt;
&lt;li&gt;Embed each chunk with a model (e.g., 1536-dim vectors)&lt;/li&gt;
&lt;li&gt;Store them in a vector DB (Pinecone, Weaviate, pgvector...)&lt;/li&gt;
&lt;li&gt;Embed the user's question with the same model, retrieve the top-k closest by cosine similarity&lt;/li&gt;
&lt;li&gt;Stuff those chunks into the prompt and call the LLM&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For its time, this was a great invention. Because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search is fast&lt;/strong&gt;: tens to hundreds of milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No training needed&lt;/strong&gt;: feed it docs, it's instantly searchable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain-agnostic&lt;/strong&gt;: works for legal documents, medical charts, internal wikis — the same machinery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rides model improvements&lt;/strong&gt;: better embedding models, better recall&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And critically, agent technology was still immature. OpenAI's Function Calling shipped in June 2023, was unstable for a while, and running a meaningful &lt;strong&gt;agentic loop&lt;/strong&gt; of multiple tool calls was both slow and expensive. So RAG was designed around the assumption: &lt;strong&gt;one retrieval has to fetch everything you need&lt;/strong&gt;. Vector RAG was perfectly tuned for this constraint.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Limits of Vector RAG
&lt;/h3&gt;

&lt;p&gt;But anyone who runs Vector RAG in production discovers the same thing fast: &lt;strong&gt;it can't follow relationships&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Take a question like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"How did last month's SNS ad campaign affect new member signups?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Vector search returns chunks that are &lt;strong&gt;textually similar&lt;/strong&gt; to the question. The campaign description might come up. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;When&lt;/strong&gt; was the campaign actually running?&lt;/li&gt;
&lt;li&gt;What were the new-member numbers during &lt;strong&gt;that same period&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;What happened with &lt;strong&gt;previous similar campaigns&lt;/strong&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't textual similarity — they're structural traversals across data. Embedding maps "spring SNS ads" and "spring promotion initiative" close together, but it cannot &lt;strong&gt;start from "ran from March 1 to March 31" and reach "new member counts in that same period"&lt;/strong&gt;. That's not a similarity problem; that's a join problem.&lt;/p&gt;

&lt;p&gt;On top of that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chunk boundaries kill context&lt;/strong&gt;: related info gets split across chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top-k cliff&lt;/strong&gt;: critical info at rank 11 is invisible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Granularity mismatch&lt;/strong&gt;: questions like "summarize the whole thing" can't be answered by collecting chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vector RAG nailed "fetch text similar to the question in one step." It's weak at "follow data through structural relationships." That's the gap that Graph RAG was born to address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph RAG — Search That Follows Relationships
&lt;/h2&gt;

&lt;p&gt;The basic idea of Graph RAG: extract &lt;strong&gt;entities&lt;/strong&gt; (people, organizations, concepts) and &lt;strong&gt;relationships&lt;/strong&gt; (belongs-to, affects, references) from your documents, store them as a graph, and at query time traverse the graph to gather information across multiple hops.&lt;/p&gt;

&lt;p&gt;This handles questions like our SNS-ads-and-new-members example — anything that requires &lt;strong&gt;multi-hop reasoning&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classical Graph RAG — Built for the One-Shot Era
&lt;/h3&gt;

&lt;p&gt;The most well-known implementation right now is Microsoft's &lt;a href="https://github.com/microsoft/graphrag" rel="noopener noreferrer"&gt;GraphRAG&lt;/a&gt;, released in 2024. The papers are well-written and I have a lot of respect for it. But the design philosophy is squarely &lt;strong&gt;from the one-shot retrieval era&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Roughly, Microsoft GraphRAG does this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Entity extraction&lt;/strong&gt;: feed the entire corpus through an LLM to extract entities and relationships&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community detection&lt;/strong&gt;: find graph clusters (communities) using the &lt;a href="https://en.wikipedia.org/wiki/Leiden_algorithm" rel="noopener noreferrer"&gt;Leiden algorithm&lt;/a&gt; (a community detection method)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical summarization&lt;/strong&gt;: have the LLM summarize each community. Then summarize groups of communities into higher-level summaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query time&lt;/strong&gt;: pick the relevant community for the user's question, dump its summary into the prompt, answer in a single shot&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why is the preprocessing this heavy? Because of the assumption underneath: &lt;strong&gt;"calling tools many times at query time isn't realistic"&lt;/strong&gt;. Function calling loops were slow, expensive, and unstable. So you preprocess the entire corpus with an LLM, build community summaries, and &lt;strong&gt;front-load the work to make query-time retrieval a single hop or two&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This wasn't a design failure — it was the &lt;strong&gt;rational answer for that era&lt;/strong&gt;. LangChain's RetrievalQA, LlamaIndex's query engines — all of them were built on the same premise: "retrieval is single-shot, generation is one-turn."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Classical Graph RAG Solved, and Didn't
&lt;/h3&gt;

&lt;p&gt;What it solved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Relationship-aware search (community summaries even cover "the big picture")&lt;/li&gt;
&lt;li&gt;Multi-hop questions like "the relationship between Sam Altman, OpenAI, and Microsoft"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it didn't solve cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Construction is expensive&lt;/strong&gt;: extracting entities from a large corpus via LLM costs real money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema is at the LLM's mercy&lt;/strong&gt;: the entities and relationships extracted are whatever the LLM thinks. This works fine for public-knowledge corpora (papers, news, etc.), but for domains that lean on internal tacit knowledge, the extracted units don't always match what's meaningful for the business&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updates are heavy&lt;/strong&gt;: every new document means recomputing communities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sometimes off-target&lt;/strong&gt;: community summaries get over-abstracted, and the specific information you actually need falls out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Honest disclaimer: I haven't seriously run classical Graph RAG in production myself. By the time I started building graph-based MCPs in our company, Claude Code was already running on my laptop, and I started from a world where &lt;strong&gt;agents calling tools many times was the default&lt;/strong&gt;. As a result, I never actually needed the heavy "compress the answer ahead of time" preprocessing of community summaries. If AI can re-fetch as many times as needed, the graph just has to hold the facts accurately.&lt;/p&gt;

&lt;p&gt;The flip side: if I had been doing this in 2023, I likely would have ended up on the same path as community summaries. The problems classical Graph RAG was solving are real — &lt;strong&gt;the underlying assumptions just changed faster than the design&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things Changed — The Agentic Era
&lt;/h2&gt;

&lt;p&gt;From late 2024 through 2025, the landscape shifted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production-grade agents arrived&lt;/strong&gt;: Claude Code, OpenAI Codex — agents that can run long tasks while orchestrating their own tool calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP (Model Context Protocol) landed&lt;/strong&gt;: tool descriptions became a standardized contract the model can read&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-use accuracy from Sonnet/Opus-class models&lt;/strong&gt;: "pick the right tool from 20" became reliable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long context windows + prompt caching&lt;/strong&gt;: stacking many tool calls in a session is now economically reasonable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;stop_reason: tool_use&lt;/code&gt; as a natural loop&lt;/strong&gt;: the model itself decides "I have enough info" or "I need to look more"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When all of these line up, &lt;strong&gt;the assumption "we can't afford retrieval as a loop" no longer holds&lt;/strong&gt;. Five tool calls per session, ten, twenty — that's now the norm.&lt;/p&gt;

&lt;p&gt;The constraint Microsoft GraphRAG was designed against — "loops are expensive at query time" — has dissolved.&lt;/p&gt;

&lt;p&gt;This isn't to say Microsoft GraphRAG is "outdated." It was the right answer for its constraints. The constraints just changed, and &lt;strong&gt;so does the optimal answer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Graph RAG — Deterministic Retrieval, AI-Driven Orchestration
&lt;/h2&gt;

&lt;p&gt;Here's the thesis. In one line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Each retrieval step is deterministic. Only the orchestration is AI.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m234tgwnhm62xyv5nqd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m234tgwnhm62xyv5nqd.png" alt="The three eras of RAG" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For context: "Agentic Graph RAG" isn't a term I coined. Neo4j's &lt;a href="https://neo4j.com/videos/nodes-ai-2026-agentic-graphrag-autonomous-knowledge-graph-construction-and-adaptive-retrieval-2/" rel="noopener noreferrer"&gt;NODES AI 2026&lt;/a&gt; featured a session titled "Agentic GraphRAG," and O'Reilly is publishing &lt;a href="https://www.oreilly.com/library/view/agentic-graph-rag/9798341623163/" rel="noopener noreferrer"&gt;Agentic GraphRAG&lt;/a&gt; by Anthony Alcaraz and Sam Julien in November 2026. The industry as a whole is pivoting from "one-shot Graph RAG" toward "agent-driven Graph RAG." This article is my attempt to put words around the design we'd been arriving at independently inside our company.&lt;/p&gt;

&lt;p&gt;That said, when "Agentic GraphRAG" is used in public contexts, the dominant framing centers on &lt;strong&gt;agents automating the graph construction itself&lt;/strong&gt; (Neo4j's talk above is in that lineage). What this article takes from that broader idea is specifically &lt;strong&gt;the query-side agentic pattern&lt;/strong&gt;. We still hand-design the graphs because the domains we target (internal DB schemas, initiatives × KPIs, codebases) lean heavily on internal tacit knowledge — for now, hand-designing produces better results in practice. We aren't rejecting auto-construction in principle; we're applying the query-side concept to graphs we still build by hand.&lt;/p&gt;

&lt;p&gt;Vector RAG had &lt;strong&gt;probabilistic retrieval&lt;/strong&gt;. Embedding cosine is an approximation, and it sometimes misses. Hallucination starts at the retrieval layer.&lt;/p&gt;

&lt;p&gt;Classical Graph RAG &lt;strong&gt;runs retrieval once at query time&lt;/strong&gt;. Heavy preprocessing prepares "the answer itself" in advance, and at query time you just look it up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic Graph RAG sits between these two.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The graph is &lt;strong&gt;designed by humans&lt;/strong&gt;. Our domains lean on internal tacit knowledge, so humans deciding "this is the granularity I want to slice the data with" produces better results.&lt;/li&gt;
&lt;li&gt;Each tool call is &lt;strong&gt;deterministic&lt;/strong&gt;. Pass an ID and you get the connected nodes and edges. There's no embedding wiggle.&lt;/li&gt;
&lt;li&gt;The AI only judges &lt;strong&gt;which tool to call next, what ID to pass in, and when to stop&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: &lt;strong&gt;errors get localized&lt;/strong&gt;. Retrieval itself is deterministic, so the only places to be wrong are "AI picked the wrong starting point" or "AI stopped too early." The data in the response is the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Return Values Become a Runbook
&lt;/h2&gt;

&lt;p&gt;The most important design move in Agentic Graph RAG: &lt;strong&gt;the tool's return value tells the AI what to do next&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zx7qfk12ke3wz1rspeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zx7qfk12ke3wz1rspeq.png" alt="Tool return values become the next instruction" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is different from a regular API. Regular APIs answer the question they were asked. MCP tools are &lt;strong&gt;in conversation with an AI&lt;/strong&gt;. The other side of the conversation needs not just an "answer" but &lt;strong&gt;candidates for the next move&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Concrete example.&lt;/p&gt;

&lt;p&gt;When the AI calls DB Graph MCP's &lt;code&gt;search_tables&lt;/code&gt; tool, it gets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 tables matched (vector similarity ranked):

warehouse.return_package_table (postgresql) (distance: 0.2557)
warehouse.receipt_record_table (postgresql) (distance: 0.2720)
inventory.receipt_confirmation_table (mysql) (distance: 0.2921)
warehouse.receipt_record_detail_table (postgresql) (distance: 0.2951)
app.return_status_change_history_table (mysql) (distance: 0.3170)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;※ Schema and table names are anonymized — they map to internal system names.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Notice that &lt;strong&gt;the response itself contains the next tool's argument&lt;/strong&gt;. The qualified name &lt;code&gt;warehouse.receipt_record_table&lt;/code&gt; is exactly what &lt;code&gt;get_table_detail(table_name: "warehouse.receipt_record_table")&lt;/code&gt; expects. If the AI decides "let me look at the details," it just copy-pastes.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;get_table_detail&lt;/code&gt; response is even more direct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# warehouse.receipt_record_table&lt;/span&gt;
DB: POSTGRESQL / ORM: typeorm / Repo: warehouse-api

&lt;span class="gu"&gt;## Columns (9)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; id: int [PK, AI, NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; shipping_order_id: varchar [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; status: enum [NOT NULL, default=IN_PROGRESS]
&lt;span class="p"&gt;-&lt;/span&gt; ...

&lt;span class="gu"&gt;## References (2)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; shipping_order_id → warehouse.shipping_order_table.id (explicit)
&lt;span class="p"&gt;-&lt;/span&gt; operator_id → warehouse.user_table.id (explicit)

&lt;span class="gu"&gt;## Enum / Status Definitions (2)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Status: COMPLETE = received, IN_PROGRESS = in progress
&lt;span class="p"&gt;-&lt;/span&gt; Type: RENTAL_RETURN = rental return, ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This response implicitly tells the AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"The meaning of &lt;code&gt;status&lt;/code&gt; is in the Enum definition"&lt;/strong&gt; → don't guess, read it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"There are FK references"&lt;/strong&gt; → if needed, you can follow them with &lt;code&gt;trace_relationships&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"There's no direct FK to the &lt;code&gt;app&lt;/code&gt; schema"&lt;/strong&gt; → you'll need a different path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, &lt;strong&gt;the tool's response is a runbook for the AI&lt;/strong&gt;. The AI reads it and assembles the next move on its own.&lt;/p&gt;

&lt;p&gt;Now look at the response from &lt;code&gt;sql_query_database&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**app**&lt;/span&gt; (staging) — 1 row

| id     | status   | warehouse_order_code |
|--------|----------|----------------------|
| 98765  | RETURNED | SO-2026-00012345     |
&lt;span class="gt"&gt;
&amp;gt; **Table**: Manages the full lifecycle of delivery orders...&lt;/span&gt;

&lt;span class="gu"&gt;### Column descriptions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**status**&lt;/span&gt;: Delivery status (1=awaiting shipment, 2=ready, 3=delivered, 4=returned, ...)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**warehouse_order_code**&lt;/span&gt;: Link code to the warehouse-side shipping order

&lt;span class="gu"&gt;### Related tables&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**app.member_table**&lt;/span&gt; (user_id → id)
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**app.plan_master**&lt;/span&gt; (plan_id → id)
&lt;span class="p"&gt;-&lt;/span&gt; ← &lt;span class="gs"&gt;**app.order_history_table**&lt;/span&gt; (delivery_id → id)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Column descriptions and related tables are auto-attached below the query result.&lt;/strong&gt; This is composed dynamically from the graph data we cached in BQ. Reading that "warehouse_order_code links to the warehouse side," the AI immediately decides "next, look up the warehouse table by this code."&lt;/p&gt;

&lt;p&gt;Nobody had to tell the AI "now look at warehouse." &lt;strong&gt;The response itself is the instruction.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  DB Graph in Action — A Production Investigation in 4 Steps
&lt;/h2&gt;

&lt;p&gt;Here's the full flow (also shown in the &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;DB Graph MCP article&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The scenario: a CS agent asks, "This member shows 'returned' in the app, but did the warehouse actually confirm receipt?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: Find tables in natural language (vector-similarity entry-point search)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search_tables(query: "return processing confirmation", search_type: "semantic")
→ warehouse.receipt_record_table, warehouse.return_package_table, ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: Look at the details (deterministic detail retrieval)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get_table_detail(table_name: "warehouse.receipt_record_table")
→ status=COMPLETE means "warehouse received it"
→ shipping_order_id connects to warehouse.shipping_order_table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt;: Find the path to the other schema (deterministic graph traversal)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trace_relationships(table_name: "warehouse.shipping_order_table", direction: "both")
→ from the app side, connection goes through an intermediate table
search_tables(query: "warehouse linkage")
→ app.warehouse_linkage_table (warehouse_order_code maps to warehouse.shipping_order.code)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4&lt;/strong&gt;: Verify against real data (deterministic query execution)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;sql_query_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"SELECT ... WHERE user_id=12345 AND status='RETURNED'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;warehouse_order_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"SO-2026-00012345"&lt;/span&gt;

&lt;span class="n"&gt;sql_query_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"warehouse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"SELECT ... WHERE code='SO-2026-00012345'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;receive_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;COMPLETE&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;confirmed&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;warehouse&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial part: &lt;strong&gt;the AI built this 4-step flow autonomously&lt;/strong&gt;. The human only asked the original question. Each step's response carried "look here next" inside it, so the AI could keep composing the next call correctly.&lt;/p&gt;

&lt;p&gt;And &lt;strong&gt;each step's retrieval is deterministic&lt;/strong&gt;. The enum definitions for &lt;code&gt;status&lt;/code&gt; in &lt;code&gt;warehouse.receipt_record_table&lt;/code&gt; are facts pulled from the graph — not values the AI invented. &lt;code&gt;warehouse_order_code = SO-2026-00012345&lt;/code&gt; is real data — not an ID the AI fabricated.&lt;/p&gt;

&lt;p&gt;This is a different experience from both Vector RAG and classical Graph RAG. Vector RAG is "return all the text in one shot," but hallucinations slip in. Classical Graph RAG is "return the community summary in one shot," but specifics get lost in summarization. Agentic Graph RAG is "&lt;strong&gt;fetch as many times as you need, but every fetch returns nothing but facts&lt;/strong&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Same Pattern, Across Many Graphs
&lt;/h2&gt;

&lt;p&gt;This pattern — what we adopt: &lt;strong&gt;human-designed graph + deterministic retrieval tools + responses that double as AI runbooks&lt;/strong&gt; — isn't limited to DB Graph and Biz Graph. We use it across many MCP servers internally.&lt;/p&gt;

&lt;p&gt;Including the ones I mentioned by name in &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the 17 internal MCP servers post&lt;/a&gt;, the lineup looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Graph&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DB Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;991 tables × 15 schemas across the company&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Biz Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5,000+ initiatives × 4,000+ KPIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Functions, APIs, events across all repos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cortex Product Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code + DB + docs + infra unified for the cortex repo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service Product Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API → DB dependencies per service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The structures are all different. DB Graph from ORM parsing. Biz Graph from meeting-slide extraction plus hand-designed MetricDomain. Code Graph from static analysis. Product Graph from JSDoc annotations on top of everything else. Different sources, different assembly.&lt;/p&gt;

&lt;p&gt;But &lt;strong&gt;the shape from the MCP-tool side is identical&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Entry-point search&lt;/strong&gt;: vector or substring to find "around here" (the only place fuzziness is allowed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detail retrieval&lt;/strong&gt;: pass an ID, get facts (deterministic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationship traversal&lt;/strong&gt;: jump from ID to ID along edges (deterministic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed next-step hints in responses&lt;/strong&gt;: related IDs, enum definitions, annotations, links&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This &lt;strong&gt;3+1&lt;/strong&gt; template is the universal Agentic Graph RAG shape. Different graph internally, identical surface. From the AI side, &lt;strong&gt;they all feel the same&lt;/strong&gt; — Claude Code uses DB Graph and Code Graph and Product Graph with the same "search → drill down → traverse" rhythm.&lt;/p&gt;

&lt;p&gt;Of the graphs above, only DB Graph and Biz Graph have dedicated deep-dive posts so far. Code Graph and the Product Graph family will get their own writeups; for this post, they're listed as fellow examples of the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Designer's Checklist
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;For implementers.&lt;/strong&gt; Below are the six things I always keep top of mind when adapting Agentic Graph RAG to a new domain.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Things I keep top of mind when building an Agentic Graph RAG:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Choose the graph-construction method based on the domain
&lt;/h3&gt;

&lt;p&gt;If the domain leans on internal tacit knowledge, &lt;strong&gt;humans deciding the nodes and edges&lt;/strong&gt; produces better results. Sometimes you intentionally design a structure that doesn't exist naturally — Biz Graph's "Week node" and "MetricDomain" are examples. &lt;strong&gt;The design is what determines quality.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Conversely, when the domain is mostly public knowledge (papers, news, public docs), having agents automate construction is a strong option (the Neo4j talk lineage). This article assumes the former.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Make retrieval deterministic
&lt;/h3&gt;

&lt;p&gt;The entry-point search may use vector similarity (to accept natural-language queries). After that, "get details by ID" and "follow relationships from this ID" must always return &lt;strong&gt;definite values via graph traversal&lt;/strong&gt;. Using similarity here lets hallucination back into the retrieval layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tool granularity: search → detail → traverse
&lt;/h3&gt;

&lt;p&gt;Don't pile everything into one giant tool. Split into search-style entry points, detail lookups, and traversal/data tools. The AI understands the difference and uses them appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Tool descriptions are AI runbooks
&lt;/h3&gt;

&lt;p&gt;Write tool descriptions as &lt;strong&gt;execution guides for the AI&lt;/strong&gt;, not human documentation. "If you see this kind of response, call this tool next." "In this situation, format the argument like this." As I mentioned in &lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;the Sandbox MCP post&lt;/a&gt;, this directly determines how smart the agent appears.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Embed "next move candidates" in responses
&lt;/h3&gt;

&lt;p&gt;Don't just return data. Return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Related IDs&lt;/strong&gt;: where to traverse next (FK targets, similar initiatives, parent commits)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enums and definitions&lt;/strong&gt;: so the AI can interpret values without guessing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Annotations and warnings&lt;/strong&gt;: DEAD flags, deprecation marks, PII (personally identifiable information) redaction notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At a granularity where the AI can read "this is what I should do next" out of the response.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Let the AI do the summarization
&lt;/h3&gt;

&lt;p&gt;Don't pre-bake "community summaries" or similar on the server. The AI assembles facts case by case at the right granularity. &lt;strong&gt;Return facts. Let the AI interpret.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Limits and Caveats
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Heads up.&lt;/strong&gt; This approach has clear weak spots. If you're considering adopting it, read this section before you start designing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Agentic Graph RAG is not a silver bullet. To be honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quality depends entirely on graph design&lt;/strong&gt;. If the schema doesn't carve up the domain correctly, no number of tool calls will reach what you want. And in tacit-knowledge-heavy domains, the call about which nodes/edges to include is one only someone deeply familiar with the domain can make.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If the agent picks the wrong entry, it falls into a deep hole&lt;/strong&gt;. Miss at the first &lt;code&gt;search_*&lt;/code&gt; and the rest of the graph traversal goes sideways. Entry-point quality matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost is tool-call-count × context length&lt;/strong&gt;. 10–20 tool calls per session add up tokens straightforwardly. Prompt caching and progress reporting via MCP help, but you have to keep an eye on it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination doesn't disappear — it relocates&lt;/strong&gt;. From the retrieval layer to "entry point selection" and "stop judgment." But it's much narrower territory, so debugging and evals get easier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first item is the one designers should worry about most. &lt;strong&gt;In tacit-knowledge domains specifically, graphs aren't found — they're designed.&lt;/strong&gt; I wrote this in the Biz Graph post too, and for these domains I don't think it can be overstated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The three eras of RAG, in one table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Representative&lt;/th&gt;
&lt;th&gt;Retrieval&lt;/th&gt;
&lt;th&gt;Orchestration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Early days&lt;/td&gt;
&lt;td&gt;Vector RAG&lt;/td&gt;
&lt;td&gt;Probabilistic (cosine)&lt;/td&gt;
&lt;td&gt;None (one-shot)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function-calling era&lt;/td&gt;
&lt;td&gt;Classical Graph RAG&lt;/td&gt;
&lt;td&gt;Pre-summarized&lt;/td&gt;
&lt;td&gt;Light, mostly one-shot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent era&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Agentic Graph RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Deterministic (graph traversal)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AI assembles in many steps&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Vector RAG made "search and dump some context" work. Classical Graph RAG packaged "follow relationships" into a single-shot lookup. Agentic Graph RAG &lt;strong&gt;separates "tools that return only facts, accurately" from "AI agents that orchestrate them in multiple steps."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The graphs we've built internally — DB Graph, Biz Graph, Code Graph, Product Graph family — they're all from the same lineage. The contents and construction differ, but in our domains they all share the same shape: &lt;strong&gt;"give Claude Code a human-designed graph through deterministic tools."&lt;/strong&gt; Which is why, from the AI side, they all feel the same.&lt;/p&gt;

&lt;p&gt;If you're building AI-native internal infrastructure, give this perspective a try. &lt;strong&gt;Don't hand the AI an answer. Hand it a map.&lt;/strong&gt; It walks much further than you think.&lt;/p&gt;

&lt;p&gt;And the quality of that map comes down to how deeply you understand the domain — at least for the domains where the relevant knowledge sits as tacit understanding inside people's heads. &lt;strong&gt;In those domains, the best AI systems are still built by the people who know the problem space best.&lt;/strong&gt; Domain expertise hasn't lost value in the AI era — it's gained it. That's been my strongest takeaway from two years of building graphs across our company.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>graphrag</category>
      <category>mcp</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Cutting Self-Built MCP Server Token Usage by 90% — The Parking Pattern</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Fri, 01 May 2026 01:10:27 +0000</pubDate>
      <link>https://forem.com/ryantsuji/cutting-self-built-mcp-server-token-usage-by-90-the-parking-pattern-3e7o</link>
      <guid>https://forem.com/ryantsuji/cutting-self-built-mcp-server-token-usage-by-90-the-parking-pattern-3e7o</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;p&gt;In my previous posts I introduced &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the full picture of our 17 internal MCP servers&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;an MCP server that lets you search 991 internal tables in natural language&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda"&gt;a Graph RAG MCP for measuring initiative impact&lt;/a&gt;, and &lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;the Sandbox MCP that lets non-engineers publish AI-built apps safely&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This time I want to share something that came out of running those in production — &lt;strong&gt;a small trick we use to cut token consumption on self-built MCP servers&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Annoyance: MCPs Eat More Tokens Than You'd Think
&lt;/h2&gt;

&lt;p&gt;The first surprise when extending an AI agent with MCP is that &lt;strong&gt;token consumption is higher than expected&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An MCP tool call is, at the end of the day, JSON-RPC over HTTP. Both the arguments the AI sends and the result the tool returns &lt;strong&gt;land directly in the conversation context&lt;/strong&gt;. If you implement things naively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending whole files as arguments → thousands of lines of source code stick to the context&lt;/li&gt;
&lt;li&gt;Returning all DB query rows → a multi-thousand-row × multi-column table sticks to the context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single tool call can easily consume tens of thousands of tokens, putting the Claude Code session straight into compaction.&lt;/p&gt;

&lt;p&gt;It's worse than just inefficiency: above a certain row count, &lt;strong&gt;the response simply fails to come back at all&lt;/strong&gt; because it exceeds MCP's payload size limit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvf5wj80sm163yguj09l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvf5wj80sm163yguj09l.png" alt="Naive implementation bloats the context" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we were ramping up our internal MCP fleet, this little mismatch was reliably making the tool experience worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Park the Big Stuff Elsewhere, Pass Only a Key
&lt;/h2&gt;

&lt;p&gt;The fix is embarrassingly simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Take the parts that tend to grow and move them off the MCP wire. Pass only a reference key (or URL) through MCP itself.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both the request side and the response side benefit from the same idea.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;What to remove&lt;/th&gt;
&lt;th&gt;Where to park it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Request&lt;/td&gt;
&lt;td&gt;Large files / source code&lt;/td&gt;
&lt;td&gt;GitHub, Drive, or any object store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response&lt;/td&gt;
&lt;td&gt;Large list data / query results&lt;/td&gt;
&lt;td&gt;Spreadsheet / GCS / BigQuery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjerq3kube4unfoy2l2mr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjerq3kube4unfoy2l2mr.png" alt="The parking pattern" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two examples from airCloset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 1: Lighter Requests — Sandbox MCP × Self-Hosted Git Server
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;Last time&lt;/a&gt; I wrote about &lt;strong&gt;Sandbox MCP&lt;/strong&gt;, the platform that lets non-engineers publish AI-built apps internally. The first iteration was fully &lt;strong&gt;MCP tool-driven file uploads&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;sandbox_write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;app_name: &lt;/span&gt;&lt;span class="s2"&gt;"todo-app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s2"&gt;"index.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;html&amp;gt;..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sandbox_write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;app_name: &lt;/span&gt;&lt;span class="s2"&gt;"todo-app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s2"&gt;"app.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"import ..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sandbox_publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;app_name: &lt;/span&gt;&lt;span class="s2"&gt;"todo-app"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The moment apps got slightly bigger, this collapsed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Constant chunking&lt;/strong&gt;: hitting the payload size limit, the AI looped through "first half of file A → second half → first half of file B → ..."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tokens going up in flames&lt;/strong&gt;: full source code landed in the conversation context — a single deploy of a few-thousand-line app could burn tens of thousands of tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retries made it worse&lt;/strong&gt;: the AI would "verify after sending" by re-reading the same file with &lt;code&gt;sandbox_read_file&lt;/code&gt;. Write → read → write loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we changed the contract: &lt;strong&gt;MCP only returns a URL; the actual content moves over git push&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. MCP returns a git URL — no payload involved&lt;/span&gt;
sandbox_init_repo&lt;span class="o"&gt;(&lt;/span&gt;app_name: &lt;span class="s2"&gt;"todo-app"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;# → https://mcp-sandbox.example.com/git/sandbox/ryan/todo-app.git&lt;/span&gt;

&lt;span class="c"&gt;# 2. AI runs git in the background — MCP isn't involved&lt;/span&gt;
git init &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git add &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"init"&lt;/span&gt;
git remote add sandbox &amp;lt;returned URL&amp;gt;
git push sandbox main

&lt;span class="c"&gt;# 3. Only the deploy command goes through MCP&lt;/span&gt;
sandbox_publish&lt;span class="o"&gt;(&lt;/span&gt;app_name: &lt;span class="s2"&gt;"todo-app"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;git push gives us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No file size limit&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Differential transfer — second-time pushes are fast&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Source code never lands in the MCP conversation context&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the AI's point of view, it's just "I got handed a git URL; I push to it." Fundamentally different in token economics.&lt;/p&gt;

&lt;p&gt;By the way, we &lt;strong&gt;don't use GitHub Organizations&lt;/strong&gt; here. Issuing GitHub seats for every employee wasn't worth the cost or operational overhead, and we already had a self-hosted Git Server on GCE for a different purpose, so we just added one repo (&lt;code&gt;sandbox-apps&lt;/code&gt;). The "park" doesn't have to be something you build from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2: Lighter Responses — DB Graph MCP × Spreadsheet
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;DB Graph MCP&lt;/a&gt; is the MCP that lets us search and query 991 internal tables in natural language.&lt;/p&gt;

&lt;p&gt;The annoying-but-common case here is &lt;strong&gt;"give me everything"-style queries&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;service_main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;user&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the result is several thousand to tens of thousands of rows, you get either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A multi-million-token response that triggers immediate session compaction&lt;/li&gt;
&lt;li&gt;An MCP error because the payload exceeds the size limit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or both. The "right" AI behavior is to do &lt;code&gt;LIMIT 100&lt;/code&gt; and analyze a sample — but if the user actually wanted &lt;strong&gt;the full list as a CSV&lt;/strong&gt;, that doesn't help them.&lt;/p&gt;

&lt;p&gt;So we built a &lt;strong&gt;"export to spreadsheet, return only the URL"&lt;/strong&gt; mode into DB Graph MCP. You can opt in explicitly, but the MCP &lt;strong&gt;also auto-falls back to this mode whenever the result exceeds a row-count threshold&lt;/strong&gt;. Even if the AI forgets to add a &lt;code&gt;LIMIT&lt;/code&gt; and the query is about to return 10,000 rows, the server decides "this is too big to return inline," exports to a spreadsheet, and hands back the URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual call (the real shape is documented in the tool description)&lt;/span&gt;
&lt;span class="nf"&gt;sql_query_database&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT * FROM ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;spreadsheet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;// ← explicit export mode&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Without `output`, the server still auto-falls back over a threshold (e.g. 500 rows)&lt;/span&gt;
&lt;span class="nf"&gt;sql_query_database&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT * FROM ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// → server detects row count → spreadsheet export + URL response&lt;/span&gt;

&lt;span class="c1"&gt;// Either way, the response shape is the same&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://docs.google.com/spreadsheets/d/{...}/edit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12483&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...],&lt;/span&gt;
  &lt;span class="nx"&gt;exported_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;row_count_exceeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;// set on auto-fallback&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response is just a URL plus metadata. The real data never enters the context. &lt;strong&gt;"Light if you're careful" becomes "light even when you're not"&lt;/strong&gt; — and that's what makes it feel safe in day-to-day operation.&lt;/p&gt;

&lt;p&gt;This pattern works because &lt;strong&gt;a surprisingly large fraction of real use cases are just "I want this data somewhere I can use it later"&lt;/strong&gt; — not "let's analyze this in chat with AI." Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Save it to a spreadsheet I can stare at later&lt;/li&gt;
&lt;li&gt;Share it with another team&lt;/li&gt;
&lt;li&gt;VLOOKUP it against another sheet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those, MCP's job ends at "write the query, drop the result somewhere." That's enough.&lt;/p&gt;

&lt;p&gt;If the user genuinely does want AI-side analysis, you do still need the data in context. The standard workflow becomes a two-step: &lt;code&gt;LIMIT 100&lt;/code&gt; for sample analysis, then &lt;code&gt;output: spreadsheet&lt;/code&gt; for the full export once the conclusion is clear.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Much Did It Save?
&lt;/h2&gt;

&lt;p&gt;Every MCP we run logs every tool call. After rolling these patterns out, &lt;strong&gt;total token consumption across all tools dropped 70–90%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Google Workspace OAuth Pairs Beautifully With This
&lt;/h2&gt;

&lt;p&gt;A note on choosing where to "park" data: &lt;strong&gt;if your MCP authenticates via Google Workspace OAuth, this whole design becomes much easier&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The reason is that you get two things from a single OAuth flow — &lt;strong&gt;two birds with one stone&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Authentication for MCP itself&lt;/strong&gt; — figuring out who's using the tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization for Workspace apps&lt;/strong&gt; — scoped access to Spreadsheet / Drive / Gmail / Calendar&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7s96kwcudokn5vm1g0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7s96kwcudokn5vm1g0a.png" alt="Two birds with one stone" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the user has logged into the MCP, you don't have to ask for any additional permissions to write to the park location. Which means you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;the operating user's own permissions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;To save files to &lt;strong&gt;that user's My Drive&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Without the MCP itself owning a write-anywhere service account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Files end up in the user's drive, not on a shared service account. "Accidentally world-readable" or "visible to people who shouldn't see it" stops being a realistic accident — it's structurally prevented.&lt;/p&gt;

&lt;p&gt;You also dodge the operational cost of issuing a separate GCP service account, storing its key safely, and managing its IAM policy out of band. The safety property genuinely comes for free.&lt;/p&gt;

&lt;p&gt;There's one catch though:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The AI agent has to be able to read the spreadsheet URL it got back.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Returning a URL alone doesn't help the AI access the underlying data. Stock tooling in Claude Code can't read a Spreadsheet directly, so you need a separate Workspace-operating MCP.&lt;/p&gt;

&lt;p&gt;At airCloset we run &lt;strong&gt;a dedicated MCP that wraps the Google Workspace APIs&lt;/strong&gt; (Drive / Sheets / Gmail / Calendar). Combined with the export pattern above, it gives us a clean flow: "drop results into a spreadsheet → call into the Workspace MCP later if the AI wants to actually read them."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DB Graph MCP → exports to Spreadsheet → returns URL
                                          ↓
              Workspace MCP ← invoked when the AI decides it needs to read the data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the user's side, this naturally produces the rhythm of "dump it into a spreadsheet first, ask AI to analyze only when needed."&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;A few small tricks for keeping self-built MCP server token consumption under control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Move the parts that tend to grow off the MCP wire&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Park them somewhere — Git server, Spreadsheet, GCS — and only pass keys/URLs through MCP&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pick a park that pairs well with Google Workspace OAuth — you get safety almost for free&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If you want the AI to read parked data later, run a Workspace-style MCP alongside&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's an unflashy design move, but &lt;strong&gt;the difference in MCP usability before and after is dramatic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're running self-built MCP servers internally and feeling the token squeeze, give it a try.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>mcp</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Bridging 'I Want to Build' and 'I Want to Publish Safely' for Non-Engineers — Sandbox MCP</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Mon, 27 Apr 2026 23:04:57 +0000</pubDate>
      <link>https://forem.com/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a</link>
      <guid>https://forem.com/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;p&gt;In my previous posts, I've introduced our internal MCP servers: &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;an MCP server for natural-language search across all our databases&lt;/a&gt;, &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the full picture of our 17 internal MCP servers&lt;/a&gt;, and &lt;a href="https://dev.to/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda"&gt;a custom Graph RAG that lets AI answer "Did that initiative actually work?"&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This time I'm covering something a bit different: &lt;strong&gt;Sandbox MCP&lt;/strong&gt; — a platform that lets non-engineer employees deploy apps they built with AI to a safe, internal-only URL &lt;strong&gt;with a single command&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The pitch is simple: "If Claude Code can build an app, why not publish it directly?" The hard part is making "directly" mean &lt;strong&gt;safely&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Building Got Easy. Publishing Safely Did Not.
&lt;/h2&gt;

&lt;p&gt;The arrival of Claude Code and other AI coding agents is reshaping how work happens inside our company.&lt;/p&gt;

&lt;p&gt;"Building an app" used to be an engineer's job. You had to do requirements, design, frontend, backend, database, CI/CD, production deploy — all in one head.&lt;/p&gt;

&lt;p&gt;Now PMs, designers, and customer-success folks are talking to Claude Code with "build me a screen that does X" and getting working mockups on the spot. Inside airCloset we're seeing more and more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mockups for new project proposals&lt;/li&gt;
&lt;li&gt;Interactive reports that visualize research findings&lt;/li&gt;
&lt;li&gt;KPI dashboards used only by a single team&lt;/li&gt;
&lt;li&gt;Small tools for everyday operational improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These &lt;strong&gt;non-engineer outputs&lt;/strong&gt; are growing fast. People are even saying "let's just run with this in production for a bit."&lt;/p&gt;

&lt;p&gt;That's where the wall hits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Easy to Build. Hard to Publish Safely.
&lt;/h3&gt;

&lt;p&gt;Anyone can build something that runs locally now. Spin up &lt;code&gt;python -m http.server 8000&lt;/code&gt;, view it on your Mac — five minutes max.&lt;/p&gt;

&lt;p&gt;But the moment it becomes "I want my team to see this" or "I want others to actually use it," the difficulty curve goes vertical.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Where do you run it?&lt;/strong&gt; Cloud means GCP/AWS accounts, IAM, billing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What URL?&lt;/strong&gt; Domain registration, DNS, SSL certificates, Cloudflare.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What about auth?&lt;/strong&gt; If it touches confidential info, you need employees-only. OAuth implementation, domain restriction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;And the data?&lt;/strong&gt; Is localStorage enough, or do you need a real DB? If a DB, who manages the password?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How do you deploy?&lt;/strong&gt; Can you write a Dockerfile? Cloud Run config, env vars, service accounts, IAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What about security?&lt;/strong&gt; What if the AI-written code has a vulnerability? An auth bypass?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You &lt;em&gt;could&lt;/em&gt; "let the AI write all of it." But the result is &lt;strong&gt;left to the AI&lt;/strong&gt;. Cloudflare misconfigured and exposed to the world. Auth bypassed. A service account with production database write access slipped into the code. The more code AI writes, the higher the risk of these accidents.&lt;/p&gt;

&lt;p&gt;When a non-engineer says "I want to try building this," we need to clearly separate &lt;strong&gt;what the builder is responsible for&lt;/strong&gt; from &lt;strong&gt;what the platform must guarantee by default&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There's also a quieter problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  UI Inconsistency and Data Sprawl
&lt;/h3&gt;

&lt;p&gt;When non-engineers build apps independently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One person uses React, another Vue, another raw HTML&lt;/li&gt;
&lt;li&gt;Buttons look and behave differently&lt;/li&gt;
&lt;li&gt;Some store data in localStorage, some in Google Sheets, some in Firebase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After 10 or 20 such apps, internal tooling becomes &lt;strong&gt;chaos&lt;/strong&gt;. Users wonder "wait, who built this one?" and "why does this button work differently?"&lt;/p&gt;

&lt;p&gt;Even for internal tools, you need &lt;strong&gt;a baseline of consistency&lt;/strong&gt; — both in design and in where data lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sandbox MCP — Standing Between "Build" and "Publish"
&lt;/h2&gt;

&lt;p&gt;That's why we built &lt;strong&gt;Sandbox MCP&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A non-engineer just says "build this" to Claude Code, and:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An app is generated using a unified UI Kit&lt;/li&gt;
&lt;li&gt;They can verify it works locally&lt;/li&gt;
&lt;li&gt;A single command deploys it to &lt;code&gt;https://sbx-{nickname}--{app-name}.example.com/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Self-hosted OAuth on the Cloudflare Worker enforces internal-only access&lt;/li&gt;
&lt;li&gt;Data is stored, isolated, in a dedicated Firestore database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;— all of this completes within a single chat session with the AI.&lt;br&gt;
The builder is only responsible for &lt;strong&gt;functionality&lt;/strong&gt;. &lt;strong&gt;Security, data isolation, domain &amp;amp; SSL, authentication&lt;/strong&gt; are all handled by the Sandbox MCP platform by default.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5e202vzg81siijp08ly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5e202vzg81siijp08ly.png" alt="System Overview" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Scale
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP tools&lt;/td&gt;
&lt;td&gt;10 (publish, status, schedule, list, delete, write_file, read_file, list_files, init_repo, unschedule)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported runtimes&lt;/td&gt;
&lt;td&gt;Python (Flask + gunicorn), Node.js, static HTML/SPA, custom Dockerfile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;URL&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;sbx-{nickname}--{app-name}.example.com&lt;/code&gt; (covered by Universal SSL, no ACM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication&lt;/td&gt;
&lt;td&gt;Self-hosted OAuth on a Cloudflare Worker (Google Workspace)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data&lt;/td&gt;
&lt;td&gt;Firestore named DB &lt;code&gt;sandbox&lt;/code&gt;, namespaced per nickname × app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Self-hosted Git Server (GCE) + Cloud Run + Cloudflare Worker + KV&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy time&lt;/td&gt;
&lt;td&gt;Typically 2–5 minutes (git push to public URL)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's walk through the internals.&lt;/p&gt;
&lt;h2&gt;
  
  
  What It Does — Web, API, DB, and Cron
&lt;/h2&gt;

&lt;p&gt;Sandbox MCP supports four app shapes so it can cover almost any "I want to ship something internally" use case.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Detected by&lt;/th&gt;
&lt;th&gt;Use cases&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.py&lt;/code&gt; files present&lt;/td&gt;
&lt;td&gt;Flask + gunicorn for APIs, analysis tools with a UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Node.js&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;package.json&lt;/code&gt; present&lt;/td&gt;
&lt;td&gt;Express APIs + UI; Bun also works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Static HTML/SPA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;only &lt;code&gt;.html&lt;/code&gt; files (no Python/Node)&lt;/td&gt;
&lt;td&gt;nginx-served, React/Vue dist supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;includes a &lt;code&gt;Dockerfile&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Any runtime — Go, Rust, Bun, anything&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pick any of these and &lt;code&gt;sandbox_publish&lt;/code&gt; deploys it with no extra config.&lt;/p&gt;

&lt;p&gt;There's also &lt;code&gt;sandbox_schedule&lt;/code&gt; for &lt;strong&gt;scheduled batch apps via Cloud Scheduler&lt;/strong&gt;. Things like "post a risk summary to Slack at 9 AM every morning" become one-line cron setups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;sandbox_schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;app_name: &lt;/span&gt;&lt;span class="s2"&gt;"risk-alert"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;schedule: &lt;/span&gt;&lt;span class="s2"&gt;"0 9 * * *"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s2"&gt;"/api/cron"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;timezone: &lt;/span&gt;&lt;span class="s2"&gt;"Asia/Tokyo"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Scheduler now hits the app's &lt;code&gt;/api/cron&lt;/code&gt; every morning at 9. No need to open the scheduler UI or translate cron syntax into IaC.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frontend — Unified Design via sandbox-ui-kit
&lt;/h2&gt;

&lt;p&gt;Even apps built by non-engineers should feel &lt;strong&gt;consistent as a tool family&lt;/strong&gt;. That's the job of the &lt;code&gt;sandbox-ui-kit&lt;/code&gt; repo.&lt;/p&gt;

&lt;p&gt;It lives on &lt;code&gt;mcp-sandbox.example.com/git&lt;/code&gt; and provides:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox-ui.css&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Design tokens + glass-morphism component styles (dark/light)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox-ui.js&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Theme switcher, modals, toasts, generic JS utilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox-db.js&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SandboxDB client SDK (more below)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;index.html&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Storybook-style component catalog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;README.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full API documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key: it's designed &lt;strong&gt;for AI to read and use&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;sandbox_publish&lt;/code&gt; tool description literally says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When building an app, first read README.md with read_file and use the UI Kit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Claude Code builds a new app, it &lt;code&gt;read_file&lt;/code&gt;s this README, learns which CSS/JS to load and which component names to use, then generates code accordingly. &lt;strong&gt;Instead of a human walking the AI through UI guidelines, we centralized the "how to use" in one place targeted at the AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The result: apps built by anyone (with AI) end up with consistent buttons, modals, and forms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backend — Auto-Generated Dockerfile + Cloud Run
&lt;/h2&gt;

&lt;p&gt;"I don't want to write Docker." "I don't want to think about runtime configuration." Classic non-engineer requests.&lt;/p&gt;

&lt;p&gt;Sandbox MCP &lt;strong&gt;inspects the source files and generates a Dockerfile automatically&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// apps/mcp/git-server/src/sandbox/tools.ts&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasPy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;dockerfile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generatePythonDockerfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasRequirements&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Auto-create requirements.txt if missing&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;hasRequirements&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;requirements.txt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;flask&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;gunicorn&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasPackageJson&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;dockerfile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateNodeDockerfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasHtml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;dockerfile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateStaticDockerfile&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, a Python app gets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PORT=8080&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "-u", "$(ls *.py | head -1)"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;requirements.txt&lt;/code&gt; is missing, &lt;code&gt;flask&lt;/code&gt; + &lt;code&gt;gunicorn&lt;/code&gt; get added automatically. AI can write &lt;code&gt;from flask import Flask&lt;/code&gt; and the dependencies will resolve — no missing-package surprises.&lt;/p&gt;

&lt;p&gt;Deployment uses &lt;code&gt;gcloud run deploy --source&lt;/code&gt;, with Cloud Build handling the image build. App authors &lt;strong&gt;can&lt;/strong&gt; write a &lt;code&gt;Dockerfile&lt;/code&gt;, but they don't have to. No Dockerfile gets the standard, with one customizes — friendly to both non-engineers and engineers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmot0cm77d734v6iotb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmot0cm77d734v6iotb1.png" alt="Deploy Flow" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Database — Transparent Fallback Between localStorage and Firestore
&lt;/h2&gt;

&lt;p&gt;"I want to save data. I don't want to set up a database."&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;SandboxDB SDK&lt;/strong&gt; handles that. The same code uses &lt;code&gt;localStorage&lt;/code&gt; locally and Firestore once deployed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://mcp-sandbox.example.com/api/db/sdk.js"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"module"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SandboxDB&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;googleOAuthAccessToken&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Save (storage location auto-detected from hostname)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// List&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Get / update / delete&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;updated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK internals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_isLocal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
              &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;127.0.0.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_isLocal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_localAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// localStorage&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_req&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                  &lt;span class="c1"&gt;// Firestore REST API&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When running on &lt;code&gt;localhost&lt;/code&gt;, it uses localStorage. The moment it's deployed under &lt;code&gt;sbx-*.example.com&lt;/code&gt;, it switches to Firestore. &lt;strong&gt;No code changes required.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This dramatically improves the experience of building apps with AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local: no network, no auth, all features work&lt;/li&gt;
&lt;li&gt;Deployed: same code runs, data is properly persisted&lt;/li&gt;
&lt;li&gt;Development data never leaks into systems outside Sandbox (it physically can't reach them)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Firestore Namespace Isolation
&lt;/h3&gt;

&lt;p&gt;Once deployed, data paths are strictly isolated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sandbox_data/{nickname}--{app}/{collection}/{docId}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;nickname&lt;/code&gt;: user identifier resolved via OAuth&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;app&lt;/code&gt;: Sandbox app name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;_createdAt&lt;/code&gt; / &lt;code&gt;_updatedAt&lt;/code&gt;: auto-attached by the SDK&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data from different apps is physically unreachable from each other. Even apps built by the same person live in different paths.&lt;/p&gt;

&lt;p&gt;The most important point: &lt;strong&gt;we use a dedicated &lt;code&gt;sandbox&lt;/code&gt; named database&lt;/strong&gt;. It's a completely separate Firestore database from the &lt;code&gt;(default)&lt;/code&gt; DB used by other internal systems. No matter how badly an app's code misbehaves, it can never touch data outside Sandbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure — Wildcard DNS + Cloudflare Worker + Self-Hosted Git Server
&lt;/h2&gt;

&lt;p&gt;Now for the infrastructure highlights.&lt;/p&gt;

&lt;h3&gt;
  
  
  How URLs Are Determined
&lt;/h3&gt;

&lt;p&gt;The public URL takes the form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://sbx-{nickname}--{app-name}.example.com/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;nickname&lt;/code&gt; is &lt;strong&gt;automatically pulled from the MCP OAuth session&lt;/strong&gt;. When a user logs into Sandbox MCP via Google, the email is looked up in a Firestore &lt;code&gt;users&lt;/code&gt; collection to resolve the nickname. Users never have to repeat "I am ryan" each time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;r.tsuji@air-closet.com → users[r.tsuji@air-closet.com].nickname → "ryan"
                                                       ↓
                                  sbx-ryan--todo-app.example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The &lt;code&gt;users&lt;/code&gt; collection is &lt;strong&gt;kept in sync from a separate internal pipeline&lt;/strong&gt; (a daily batch that pulls from our HR system and Google Workspace directory). Sandbox MCP just reads from it — no need to maintain its own employee master.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The benefit: you can tell &lt;strong&gt;whose app it is&lt;/strong&gt; just by reading the URL. When someone says "go look at ryan's todo-app," reading the URL aloud naturally communicates ownership.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instant Publishing via Cloudflare Worker
&lt;/h3&gt;

&lt;p&gt;Normally, publishing a new subdomain requires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Adding A/CNAME DNS records&lt;/li&gt;
&lt;li&gt;Issuing an SSL certificate (15–30 minute wait with ACM or Let's Encrypt)&lt;/li&gt;
&lt;li&gt;Configuring a load balancer or DomainMapping&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sandbox MCP skips all of this with a &lt;strong&gt;Cloudflare Edge Router Worker&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ud50w3df9qr0nwekbjo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ud50w3df9qr0nwekbjo.png" alt="URL Routing" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DNS is fixed as &lt;code&gt;*.example.com&lt;/code&gt; &lt;strong&gt;wildcard&lt;/strong&gt; + Cloudflare proxy, with Universal SSL automatically covering every subdomain. The Cloudflare Worker receives all &lt;code&gt;*.example.com/*&lt;/code&gt; traffic and routes by subdomain.&lt;/p&gt;

&lt;p&gt;The logic is three-tier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// apps/worker/edge-router/src/index.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// ① sbx-* prefix → Sandbox routing&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sandboxSub&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractSandboxSubdomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sandboxSub&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleSandboxRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sandboxSub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// ② KV route:{subdomain} registered → Cloud Run proxy&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subdomain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractSubdomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;subdomain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;proxyResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleCloudRunProxy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subdomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;proxyResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;proxyResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// ③ Otherwise → fetch(request) passthrough&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;sandbox_publish&lt;/code&gt; finishes, all it does is &lt;strong&gt;write a &lt;code&gt;route:{nickname}/{app}&lt;/code&gt; key into Cloudflare KV&lt;/strong&gt;. That single write makes the new subdomain routable instantly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;kvPut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`route:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;nickname&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;appName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;serviceUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No DNS setup. No waiting for SSL issuance. No IaC deploy. Everything completes within the MCP tool execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Hosted Git Server for Larger Apps
&lt;/h3&gt;

&lt;p&gt;This setup actually started out &lt;strong&gt;without git at all&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Since the primary users were going to be PMs and CS folks, we figured "git concepts are too high a bar — let's keep everything inside MCP tools." Write files via &lt;code&gt;sandbox_write_file&lt;/code&gt;, deploy via &lt;code&gt;sandbox_publish&lt;/code&gt;. That should be enough, we thought.&lt;/p&gt;

&lt;p&gt;The approach hit two walls quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 1: Constant chunking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MCP tool calls travel over HTTP, with a payload size limit. React/Vue build bundles, SPAs with images, business tools with dozens of files — they don't fit in a single call. We added an &lt;code&gt;append&lt;/code&gt; mode to &lt;code&gt;sandbox_write_file&lt;/code&gt; for chunking, but every "first half of file A → second half of file A → first half of file B → ..." sequence triggered error recovery and retries. Deployments became flaky.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 2: Massive token consumption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the real killer. When you tell the AI "deploy this app," it sends the entire source as MCP tool arguments. &lt;strong&gt;The file contents land in the conversation context&lt;/strong&gt;, and a few-thousand-line app burns through tokens fast. A single deploy easily consumed tens of thousands of tokens, and Claude Code sessions hit compaction quickly.&lt;/p&gt;

&lt;p&gt;Worse, the AI tends to "verify after sending" — re-reading the same file via &lt;code&gt;sandbox_read_file&lt;/code&gt;. &lt;strong&gt;Write → read → write loops, with tokens going up in flames.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So we pivoted to &lt;strong&gt;using git push as well&lt;/strong&gt;. With git push:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No file size limit&lt;/li&gt;
&lt;li&gt;Differential transfer — second-time pushes are fast&lt;/li&gt;
&lt;li&gt;Source code stays out of the MCP conversation context (no AI tokens consumed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We never expected business-side employees to run &lt;code&gt;git push&lt;/code&gt; by hand. But if &lt;strong&gt;Claude Code runs git commands in the background&lt;/strong&gt;, it's not a barrier. The user just says "build this and publish it" — the AI runs &lt;code&gt;git init &amp;amp;&amp;amp; git push&lt;/code&gt; on its own when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a Self-Hosted Git Server?
&lt;/h3&gt;

&lt;p&gt;Once we adopted git push, the next question was: where do we host the repos? We considered using GitHub Organizations but ruled it out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Issuing and managing GitHub accounts for every employee&lt;/strong&gt; — including non-engineers — wasn't worth the cost or the operational overhead. Paying for a GitHub seat just to ship one app is overkill.&lt;/p&gt;

&lt;p&gt;Fortunately, we already operated &lt;strong&gt;a self-hosted Git Server on GCE for a different purpose&lt;/strong&gt;: hosting an internal "read-only Git MCP for code investigation." A VM with repositories cloned under &lt;code&gt;/mnt/repos/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We just added a &lt;strong&gt;Git Smart HTTP Protocol&lt;/strong&gt; endpoint and one new repo (&lt;code&gt;sandbox-apps&lt;/code&gt;) to it. The VM was already running, so the marginal cost was near zero. Authentication piggybacks on the existing Google OAuth setup. Repository management is just OS directory operations. Borrowing space on the existing internal Git Server was vastly simpler than spinning up new infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Actual Usage Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get the git URL from the MCP tool (nickname is automatic)&lt;/span&gt;
sandbox_init_repo&lt;span class="o"&gt;(&lt;/span&gt;app_name: &lt;span class="s2"&gt;"my-app"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;# → https://mcp-sandbox.example.com/git/sandbox/ryan/my-app.git&lt;/span&gt;

&lt;span class="c"&gt;# 2. Local commit (the AI does this in the background)&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/my-app/
git init &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git add &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"init"&lt;/span&gt;
git remote add sandbox &amp;lt;returned URL&amp;gt;

&lt;span class="c"&gt;# 3. Push&lt;/span&gt;
git push sandbox main
&lt;span class="c"&gt;# Username: oauth2accesstoken&lt;/span&gt;
&lt;span class="c"&gt;# Password: $(gcloud auth print-access-token)&lt;/span&gt;

&lt;span class="c"&gt;# 4. Deploy&lt;/span&gt;
sandbox_publish&lt;span class="o"&gt;(&lt;/span&gt;app_name: &lt;span class="s2"&gt;"my-app"&lt;/span&gt;, description: &lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auth uses a Google OAuth token as the Basic Auth password (same pattern as GCP Source Repos). Only &lt;code&gt;@air-closet.com&lt;/code&gt; accounts pass. No GitHub account required — any employee can push.&lt;/p&gt;

&lt;p&gt;The remote repo is configured with &lt;code&gt;receive.denyCurrentBranch=updateInstead&lt;/code&gt;, so the working tree updates server-side on push. Cloud Run uses that directory as &lt;code&gt;--source&lt;/code&gt;, so there's no extra step between push and publish.&lt;/p&gt;

&lt;p&gt;For small apps (a few files, hundreds of lines each), &lt;code&gt;sandbox_write_file&lt;/code&gt; still works fine. &lt;strong&gt;Switch between MCP-only and git push depending on app size.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security — Four Independent Gates
&lt;/h2&gt;

&lt;p&gt;That covered the "convenient to build" side. Now the &lt;strong&gt;"safe to publish"&lt;/strong&gt; side.&lt;/p&gt;

&lt;p&gt;As I noted at the start, exposing AI-generated code in front of users is risky. So Sandbox MCP layers four independent safety mechanisms that &lt;strong&gt;don't depend on the app's own implementation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqni3xiout6qsnmle80gl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqni3xiout6qsnmle80gl.png" alt="Security Layers" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ① Public-Facing Gate — Self-Hosted OAuth on the Cloudflare Worker
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;sbx-*.example.com&lt;/code&gt; sits behind a &lt;strong&gt;self-hosted OAuth gate built into the same Cloudflare Worker&lt;/strong&gt; that handles routing. When someone visits, the Worker first checks the &lt;code&gt;cortex_session&lt;/code&gt; cookie; if it's missing or invalid, it redirects to a Google Workspace SSO entry point (&lt;code&gt;auth.example.com/__edge/auth/start&lt;/code&gt;). Without an &lt;code&gt;@air-closet.com&lt;/code&gt; account, requests never reach Cloud Run.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;independent of the app's implementation&lt;/strong&gt;. Even if the AI didn't write a single line of auth code, the Worker stops the request first. "Accidentally public" is physically impossible.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why we migrated from ZeroTrust Access to self-hosted OAuth
&lt;/h4&gt;

&lt;p&gt;The first iteration used &lt;strong&gt;Cloudflare ZeroTrust Access&lt;/strong&gt;. You just configure the &lt;code&gt;@air-closet.com&lt;/code&gt; domain restriction in the Cloudflare dashboard and you're done — no auth code at all. As a starting point it was ideal.&lt;/p&gt;

&lt;p&gt;The catch: &lt;strong&gt;ZeroTrust's free tier caps at 50 users&lt;/strong&gt;. As headcount grew and Sandbox MCP usage spread, we approached the cap, and switching to pay-as-you-go (~$7/user/month) wasn't trivially cheap. On top of that we wanted to share the same auth foundation with internal apps in production (KPI dashboards, inventory tools, etc.), so we decided to &lt;strong&gt;consolidate everything into a self-hosted OAuth with no user limit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Conveniently, the Cloudflare Worker already in front of every &lt;code&gt;*.example.com&lt;/code&gt; request — the routing layer Sandbox MCP relies on — was perfectly positioned for this. A small extension gave us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;auth.example.com/__edge/auth/start&lt;/code&gt; to kick off Google OAuth 2.0&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;auth.example.com/__edge/auth/callback&lt;/code&gt; to exchange tokens, persist the session in Upstash Redis, and issue a &lt;code&gt;cortex_session&lt;/code&gt; cookie scoped to &lt;code&gt;Domain=.example.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Worker-level gating for sandbox + internal-app subdomains, injecting &lt;code&gt;X-Cortex-User-Email&lt;/code&gt; and friends into the Cloud Run request when authenticated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this fits inside the existing Worker — no extra Cloud Run, no extra VM. Workers do have a CPU-time budget, but &lt;strong&gt;OAuth flows and cookie checks complete in single-digit milliseconds&lt;/strong&gt;, so latency is indistinguishable from ZeroTrust.&lt;/p&gt;

&lt;p&gt;Net result: the user cap is gone, anyone with &lt;code&gt;@air-closet.com&lt;/code&gt; can use Sandbox out of the box, and the auth implementation is fully visible in our own codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  ② Deploy Gate — MCP OAuth
&lt;/h3&gt;

&lt;p&gt;Operations like &lt;code&gt;sandbox_publish&lt;/code&gt; and &lt;code&gt;sandbox_delete&lt;/code&gt; &lt;strong&gt;enforce Google OAuth on the MCP server side&lt;/strong&gt;. Sandbox MCP implements RFC 8414 (&lt;code&gt;/.well-known/oauth-authorization-server&lt;/code&gt;), so Claude Code runs the OAuth flow automatically on first connection.&lt;/p&gt;

&lt;p&gt;The strongest guarantee is &lt;strong&gt;"you can't accidentally update or delete someone else's app."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When multiple people share a Sandbox MCP, an AI accident like "wait, I overwrote a coworker's app while updating mine" would be devastating. To prevent that, &lt;strong&gt;the AI doesn't get to decide whose app is being touched&lt;/strong&gt;. The server injects &lt;code&gt;nickname&lt;/code&gt; automatically from the OAuth session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Strip the `nickname` property from the MCP tool schema and have&lt;/span&gt;
&lt;span class="c1"&gt;// the server force-inject the logged-in user's nickname.&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;injectNickname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;McpTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userNickname&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;McpTool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;nickname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;restProperties&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;restProperties&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;nickname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userNickname&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the AI's perspective, the &lt;code&gt;nickname&lt;/code&gt; input doesn't exist. Even with a prompt injection like "delete ryan's app," there's no mechanism to do so. &lt;strong&gt;"You can only touch your own apps" is enforced at the API spec level.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On top of that, inputs are validated strictly against &lt;code&gt;/^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$/&lt;/code&gt;, rejecting shell-injection and path-traversal patterns (&lt;code&gt;..&lt;/code&gt;, &lt;code&gt;/&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  ③ Data Gate — SandboxDB Namespace Isolation
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier, data lives at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sandbox_data/{nickname}--{app}/...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per request, the SandboxDB API resolves the path &lt;strong&gt;server-side&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser (OAuth): resolve &lt;code&gt;email → users → nickname&lt;/code&gt;, take &lt;code&gt;app&lt;/code&gt; from the &lt;code&gt;Origin&lt;/code&gt; header&lt;/li&gt;
&lt;li&gt;Backend (SA token): take &lt;code&gt;nickname/app&lt;/code&gt; from the &lt;code&gt;X-Sandbox-App&lt;/code&gt; header (required — missing returns 400)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The client cannot spoof the path.&lt;/p&gt;

&lt;p&gt;We deliberately do &lt;strong&gt;not&lt;/strong&gt; use the &lt;code&gt;K-Service&lt;/code&gt; header (the Cloud Run-injected service name). That's a client-spoofable header, and another implementation that relied on it had a "read another app's data" vulnerability disclosed. Requiring &lt;code&gt;X-Sandbox-App&lt;/code&gt; keeps the only valid route through an explicitly server-validated path.&lt;/p&gt;

&lt;p&gt;The clincher: &lt;strong&gt;a dedicated named database for Sandbox&lt;/strong&gt;. Instead of the &lt;code&gt;(default)&lt;/code&gt; DB (which contains data from other systems), we use an independent Firestore database called &lt;code&gt;sandbox&lt;/code&gt;, and the Cloud Run SA gets an IAM Condition that allows access only to the &lt;code&gt;sandbox&lt;/code&gt; DB.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From infra/mcp/git-server/index.ts&lt;/span&gt;
&lt;span class="c1"&gt;// IAM Condition on roles/datastore.user:&lt;/span&gt;
&lt;span class="c1"&gt;//   resource.name == "projects/.../databases/sandbox" ||&lt;/span&gt;
&lt;span class="c1"&gt;//   resource.name.startsWith("projects/.../databases/sandbox/")&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No matter how badly the AI-written code goes wrong, it physically cannot reach data outside Sandbox.&lt;/p&gt;

&lt;h3&gt;
  
  
  ④ Execution Gate — Cloud Run SA + IAM
&lt;/h3&gt;

&lt;p&gt;All &lt;code&gt;sandbox-*&lt;/code&gt; Cloud Run services run under &lt;strong&gt;a single shared SA&lt;/strong&gt; (e.g. &lt;code&gt;sandbox-run&lt;/code&gt;). The permissions on that SA are minimal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;roles/logging.logWriter&lt;/code&gt; (write its own logs)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;roles/bigquery.jobUser&lt;/code&gt; + &lt;code&gt;bigquery.dataViewer&lt;/code&gt; scoped to the &lt;code&gt;sandbox_logs&lt;/code&gt; dataset only (its own access logs, nothing else)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;roles/datastore.user&lt;/code&gt; (IAM Condition limiting to &lt;code&gt;sandbox&lt;/code&gt; DB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it does &lt;strong&gt;not&lt;/strong&gt; have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to the &lt;code&gt;(default)&lt;/code&gt; Firestore that holds data from other systems&lt;/li&gt;
&lt;li&gt;Access to BigQuery datasets used by other internal systems&lt;/li&gt;
&lt;li&gt;Direct access to Secret Manager&lt;/li&gt;
&lt;li&gt;Permission to manage other Cloud Run services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, &lt;strong&gt;even if a Sandbox app goes completely rogue, the blast radius is limited to &lt;code&gt;sandbox_data&lt;/code&gt; and &lt;code&gt;sandbox_logs&lt;/code&gt;&lt;/strong&gt;. Nothing outside Sandbox is affected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logging — Apps Can Query Their Own Access Logs
&lt;/h2&gt;

&lt;p&gt;Sandbox apps eventually want to look at logs too. "How many views did this page get?" "Who hit that error?"&lt;/p&gt;

&lt;p&gt;We forward Cloud Run request logs to BigQuery via a &lt;strong&gt;Logging Sink&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From infra/mcp/git-server/index.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sandboxLogSink&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;gcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ProjectSink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sandbox-logs-sink&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`bigquery.googleapis.com/projects/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/datasets/sandbox_logs`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;resource.type="cloud_run_revision"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;resource.labels.service_name:"sandbox-"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logName:"run.googleapis.com%2Frequests"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; AND &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;bigqueryOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;usePartitionedTables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;sandbox_logs&lt;/code&gt; dataset is locked down with &lt;strong&gt;project-owner-only ACLs&lt;/strong&gt; (it contains PII like remoteIp and User-Agent), and the Sandbox SA gets a tightly scoped &lt;code&gt;bigquery.dataViewer&lt;/code&gt; to it.&lt;/p&gt;

&lt;p&gt;This lets apps query their own access logs from BigQuery. "Post last week's user count for this app to Slack" can be done entirely inside Sandbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Design — Making AI Use Tools Correctly
&lt;/h2&gt;

&lt;p&gt;Let me close with a note on tool definitions. I personally think this is where MCP design really makes or breaks.&lt;/p&gt;

&lt;p&gt;Sandbox MCP exposes 10 tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_publish&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start deploy (async)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_deploy_status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check deploy status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_init_repo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Initialize git push repo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_write_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Write file (overwrite/append)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_delete&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_schedule&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Configure Cloud Scheduler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_unschedule&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Remove Cloud Scheduler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_read_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read source code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sandbox_list_files&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List files&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Whether the AI picks the right tool at the right moment is almost entirely determined by &lt;strong&gt;what's written in the tool description&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, the description for &lt;code&gt;sandbox_publish&lt;/code&gt; covers not just functionality but also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supported app types and required files (Python / Node.js / static HTML / custom)&lt;/li&gt;
&lt;li&gt;Startup command and PORT requirement per type&lt;/li&gt;
&lt;li&gt;When to use &lt;code&gt;write_file&lt;/code&gt; vs &lt;code&gt;git push&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;How to use SandboxDB (with SDK code samples)&lt;/li&gt;
&lt;li&gt;How to use the UI Kit (explicit instruction to fetch README.md via &lt;code&gt;read_file&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this in place, the AI can autonomously do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User says "build me a tool that displays Slack emoji scores"&lt;/li&gt;
&lt;li&gt;→ Reads &lt;code&gt;sandbox_publish&lt;/code&gt; description and sees "first read the UI Kit README"&lt;/li&gt;
&lt;li&gt;→ Calls &lt;code&gt;read_file&lt;/code&gt; on &lt;code&gt;sandbox-ui-kit/README.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;→ Generates HTML/CSS/JS following the guidelines&lt;/li&gt;
&lt;li&gt;→ Sees the SandboxDB SDK usage in the description and integrates persistence&lt;/li&gt;
&lt;li&gt;→ Calls &lt;code&gt;sandbox_publish&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;— without asking the user a single follow-up question. &lt;strong&gt;Writing not just "what it does" but "what to do with it" into the tool definition&lt;/strong&gt; is the secret to AI-friendly design.&lt;/p&gt;

&lt;p&gt;If you write tool definitions tersely, the AI keeps coming back asking "what should I do next?" The description is less of a human-facing doc and more of an &lt;strong&gt;AI-facing runbook&lt;/strong&gt;. That framing helps a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;Sandbox MCP exists to answer two challenges of building internal tools in the AI era:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Building&lt;/strong&gt; is now possible for anyone, thanks to AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publishing safely&lt;/strong&gt; remains hard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To close that gap, we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standardized every layer&lt;/strong&gt; on the platform side: frontend / backend / DB / infra / auth / domain / SSL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedded a runbook into tool descriptions&lt;/strong&gt; so the AI naturally uses things correctly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layered four access gates&lt;/strong&gt; (Worker-level OAuth / MCP OAuth / namespace isolation / IAM) so safety &lt;strong&gt;doesn't depend on the implementation being correct&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building this, what struck me again is that &lt;strong&gt;the role of platforms in an AI-powered development era is shifting&lt;/strong&gt;. Platforms used to optimize for "easy for humans." Now they also need to optimize for &lt;strong&gt;"used correctly by AI."&lt;/strong&gt; Tool descriptions are AI-facing docs, and safety must be designed assuming AI will write incorrect code.&lt;/p&gt;

&lt;p&gt;At the same time, by &lt;strong&gt;limiting what the builder is responsible for&lt;/strong&gt;, we drastically lower the barrier to "let me just try something." That's the entry point that turns a non-engineer's "I want to build this" into actual operational improvements.&lt;/p&gt;

&lt;p&gt;I hope this is useful for anyone designing internal platforms.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Still Measuring Initiative Impact Manually? How We Used Graph RAG + MCP to Make It Explorable</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Mon, 20 Apr 2026 15:27:35 +0000</pubDate>
      <link>https://forem.com/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda</link>
      <guid>https://forem.com/ryantsuji/we-built-a-custom-graph-rag-to-let-ai-answer-did-that-initiative-actually-work-3oda</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset.&lt;/p&gt;

&lt;p&gt;In my previous posts, I introduced &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;an MCP server that lets you search all company databases in natural language&lt;/a&gt; and showed &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;the full picture of our 17 internal MCP servers&lt;/a&gt;. This time, I'm diving deep into what I briefly mentioned as "Biz Graph."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the story of how we represented the relationship between business initiatives and KPIs as a graph structure, enabling AI to answer "Did that initiative actually work?"&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Graph RAG?
&lt;/h2&gt;

&lt;p&gt;To get more value from AI, what matters is not just feeding it data — it's conveying &lt;strong&gt;the relationships between data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your data volume is small enough, tools like NotebookLM can deliver great results. But you can't fit all your business data into a context window. Initiative reports, KPI spreadsheets, marketing weekly reports, logistics daily metrics — you simply cannot dump all of that into a prompt.&lt;/p&gt;

&lt;p&gt;That's why I believe the best available option right now is &lt;strong&gt;Graph RAG&lt;/strong&gt;: making the right data searchable at any time, along with its relationships. When AI is asked "What metrics are related to this initiative?", it can traverse the graph and extract only the information it needs — because that structure was built in advance.&lt;/p&gt;

&lt;p&gt;But there's a catch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Non-Graph Data Into a Graph
&lt;/h2&gt;

&lt;p&gt;Many of you have heard of "knowledge graphs" and "GraphRAG." But when you actually try to build one, most people hit the same wall:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business data doesn't naturally form a graph.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With our DB Graph project, things were different. Tables had foreign keys. ORMs had &lt;code&gt;@JoinColumn&lt;/code&gt; and &lt;code&gt;belongsTo&lt;/code&gt;. &lt;strong&gt;Relationships already existed in the data&lt;/strong&gt; — we just had to parse and convert them.&lt;/p&gt;

&lt;p&gt;But the relationship between "initiatives" and "KPIs" has none of that.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A meeting slide says "SNS ad campaign launched"&lt;/li&gt;
&lt;li&gt;A spreadsheet records "This week's new members: 1,234"&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;There's no FK between these. No join key.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"The SNS campaign affected new member signups" — that relationship &lt;strong&gt;exists only in someone's head&lt;/strong&gt;. It's nowhere in the spreadsheet.&lt;/p&gt;

&lt;p&gt;This is what "business data doesn't form a graph" means. The relationships between entities aren't self-evident — &lt;strong&gt;you have to design the graph structure itself&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: "Did That Initiative Actually Work?"
&lt;/h2&gt;

&lt;p&gt;Every week, our company reports initiative progress in all-hands meetings and group-level standups.&lt;/p&gt;

&lt;p&gt;"We launched the spring SNS ad campaign"&lt;br&gt;
"We improved the recommendation engine"&lt;br&gt;
"We're raising our CS SLA achievement rate"&lt;/p&gt;

&lt;p&gt;— Dozens of initiatives reported weekly. Hundreds per year. &lt;strong&gt;Over 5,000 total&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Meanwhile, a separate spreadsheet tracks 200+ metrics daily and weekly: member count, new signups, retention rate, satisfaction scores, acquisition CPA...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem: these two worlds are completely disconnected.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"How much did last month's SNS campaign contribute to new member acquisition?"&lt;/p&gt;

&lt;p&gt;Answering this requires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Confirm the initiative's execution period (which slide was that again?)&lt;/li&gt;
&lt;li&gt;Find KPI data for that period (which sheet, which tab?)&lt;/li&gt;
&lt;li&gt;Align timeframes and compare numbers (week-over-week? month-over-month? year-over-year?)&lt;/li&gt;
&lt;li&gt;Check if other initiatives were running simultaneously (confounding factors?)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This manual analysis takes 30-60 minutes, &lt;strong&gt;happening every week for multiple initiatives&lt;/strong&gt;. Realistically, most initiative effectiveness reviews end with "it probably worked, I think."&lt;/p&gt;
&lt;h2&gt;
  
  
  Biz Graph: The Big Picture
&lt;/h2&gt;

&lt;p&gt;We built &lt;strong&gt;Biz Graph&lt;/strong&gt; to solve this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76yav9k6uto7a65uzwux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76yav9k6uto7a65uzwux.png" alt="System Overview" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Scale
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: The numbers below differ from actual values but convey the order of magnitude. In any case, this is far too much data to fit in an LLM's context window.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nodes&lt;/td&gt;
&lt;td&gt;~10,000 (14 types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edges&lt;/td&gt;
&lt;td&gt;~71,000 (22 types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initiatives&lt;/td&gt;
&lt;td&gt;~5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KPI Metrics&lt;/td&gt;
&lt;td&gt;~4,000 (members/signups/retention/satisfaction/UX/marketing/logistics)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marketing Channels&lt;/td&gt;
&lt;td&gt;~100 (SEM/LINE/email/CRM etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Sources&lt;/td&gt;
&lt;td&gt;9 tables/spreadsheets&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Three Components
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Biz Graph Transformer&lt;/strong&gt; — Weekly graph rebuild from all data sources (Cloud Run Job, every Friday 22:00)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Biz Graph MCP Server&lt;/strong&gt; — Graph search + time series analysis accessible from AI (Cloud Run)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Biz Data Loader&lt;/strong&gt; — Daily auto-import of marketing/logistics data (Cloud Run Job, every morning 6:00)&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  The Core Design: The Week Node
&lt;/h2&gt;

&lt;p&gt;Here's the heart of this article.&lt;/p&gt;

&lt;p&gt;How do you connect "initiatives" and "metrics" in a graph? The obvious first thought is direct edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Initiative("SNS campaign") ──AFFECTS──→ Metric("new_members")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This design breaks down.&lt;/strong&gt; Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Edge explosion&lt;/strong&gt;: 5,000 initiatives × 4,000 metrics = up to 20 million edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal uncertainty&lt;/strong&gt;: "SNS campaign affected new members" is a hypothesis, not a fact. Direct edges make it look like a confirmed relationship&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing temporal info&lt;/strong&gt;: There's no way to express &lt;em&gt;when&lt;/em&gt; the impact occurred&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead, we designed &lt;strong&gt;Week nodes as shared anchors for indirect connections&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1pvj76dyic8v8p63yfd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1pvj76dyic8v8p63yfd.png" alt="Week Anchor" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Initiative("SNS campaign")     ──ACTIVE_DURING_WEEK──→  Week:2026-03-03
Metric("new_members")          ──HAS_DATA_AT──→         Week:2026-03-03
QualityMetric("avg_rating")    ──HAS_QUALITY_DATA_AT──→ Week:2026-03-03
MarketingChannel("SEM brand")  ──HAS_MARKETING_DATA_AT──→ Week:2026-03-03
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initiatives and metrics aren't directly connected — they're &lt;strong&gt;indirectly linked through the same week&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Prevents edge explosion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Initiatives only connect to "weeks they were active." Metrics only connect to "weeks that have data." Instead of a cross-product, each connects independently to Week nodes — edge count grows linearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Expresses co-occurrence, not causation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Initiatives that were active the same week as metric fluctuations" — this isn't asserting causation, it's a structure for &lt;strong&gt;discovering causal candidates&lt;/strong&gt;. It leaves room for human or AI judgment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Edge types distinguish data sources&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same Week node, but &lt;code&gt;HAS_DATA_AT&lt;/code&gt; (business KPIs), &lt;code&gt;HAS_QUALITY_DATA_AT&lt;/code&gt; (service quality), &lt;code&gt;HAS_UX_DATA_AT&lt;/code&gt; (UX metrics), &lt;code&gt;HAS_MARKETING_DATA_AT&lt;/code&gt; (marketing), &lt;code&gt;HAS_LOGI_DATA_AT&lt;/code&gt; (logistics) — "what kind of data" is embedded in the edge type itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Time series traversal is natural&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Week nodes are connected by &lt;code&gt;NEXT_WEEK&lt;/code&gt; edges. "How did metrics change in the 3 weeks before and after initiative start?" can be expressed as graph traversal.&lt;/p&gt;

&lt;h2&gt;
  
  
  MetricDomain: Bridging Worlds Without Join Keys
&lt;/h2&gt;

&lt;p&gt;Week nodes tell us "what happened the same week," but not &lt;strong&gt;which metrics are relevant to a given initiative&lt;/strong&gt;. There's no point looking at logistics data when analyzing an SNS ad campaign.&lt;/p&gt;

&lt;p&gt;However, there's &lt;strong&gt;no join key&lt;/strong&gt; between initiative categories ("Marketing (Advertising)") and metric groups ("New Acquisition"). The knowledge that "ad initiatives relate to new acquisition" is tacit — it exists only in people's heads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MetricDomain&lt;/strong&gt; (6 domains) structuralizes this tacit knowledge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwytm7tcvib06q1qu7pnr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwytm7tcvib06q1qu7pnr.png" alt="MetricDomain" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Connected metric types&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;acquisition&lt;/td&gt;
&lt;td&gt;New acquisition&lt;/td&gt;
&lt;td&gt;Marketing channels, new member count, registration CV&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;retention&lt;/td&gt;
&lt;td&gt;Retention / churn prevention&lt;/td&gt;
&lt;td&gt;Member count, churn rate, plan transitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;service_quality&lt;/td&gt;
&lt;td&gt;Service quality&lt;/td&gt;
&lt;td&gt;Satisfaction, ratings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;operations&lt;/td&gt;
&lt;td&gt;Operations&lt;/td&gt;
&lt;td&gt;Selection, shipping, returns, logistics KPIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ux&lt;/td&gt;
&lt;td&gt;UX experience&lt;/td&gt;
&lt;td&gt;Sessions, funnels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;revenue&lt;/td&gt;
&lt;td&gt;Revenue / purchases&lt;/td&gt;
&lt;td&gt;Purchase CV, upsell&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These 6 domains aren't fixed — they can be freely added or split as the business grows and the organization evolves. Domain definitions are just mapping tables in code, so the cost of expansion is nearly zero.&lt;/p&gt;

&lt;p&gt;By &lt;strong&gt;humans defining&lt;/strong&gt; the mapping between initiative categories and MetricDomains, and between metric groups and MetricDomains, we enable "automatically show acquisition-related metrics when viewing a marketing initiative."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Category("Marketing ads") ──CATEGORY_IN_DOMAIN──→ MetricDomain("acquisition")
                                                           ↑ IN_DOMAIN
                                                  MetricGroup("New Acquisition")
                                                  MarketingChannel("SEM brand")
                                                  UxMetric("registration_completed")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Pass &lt;code&gt;domain: "acquisition"&lt;/code&gt; to &lt;code&gt;compare_metrics&lt;/code&gt;, and the initiative overlay automatically filters to acquisition-related initiatives only.&lt;/p&gt;

&lt;h2&gt;
  
  
  SIMILAR_TO: AI Answers "Have We Done Something Like This Before?"
&lt;/h2&gt;

&lt;p&gt;Another unique design element: &lt;strong&gt;SIMILAR_TO edges&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Initiative text (title + description) is vectorized to 768 dimensions using Vertex AI's gemini-embedding-001, then BigQuery's VECTOR_SEARCH auto-detects similar pairs with cosine similarity &amp;gt;= 0.75.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;VECTOR_SEARCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cortex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;biz_graph_nodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'embedding'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cortex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;biz_graph_nodes&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;node_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Initiative'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;distance_type&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;  &lt;span class="c1"&gt;-- distance &amp;lt;= 0.25 = similarity &amp;gt;= 0.75&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Currently &lt;strong&gt;~13,000 SIMILAR_TO edges&lt;/strong&gt; exist. Up to 5 similar initiatives are pre-computed for each one.&lt;/p&gt;

&lt;p&gt;"Didn't we run a similar SNS campaign last summer? How did that one perform?" — traverse similar initiatives on the graph instantly, then compare KPI changes during weeks those initiatives were active.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Usage Examples
&lt;/h2&gt;

&lt;p&gt;Here's how exploration works via MCP tools.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All tool execution examples below run through MCP from an AI coding agent. The response format matches the real system, but numbers are dummy values and content is simplified.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  "Find marketing initiatives that drove acquisition"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;search_initiatives(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SNS advertising for new acquisition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acquisition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dateFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-10-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dateTo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-31"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response (excerpt):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 initiatives found (by vector similarity):

1. SNS Ad Spring Collection Campaign (2026-03-09)
   Category: Marketing (Advertising)
   Similarity: 892/1000

2. Instagram Reels Ad Test (2026-02-23)
   Category: Marketing (Advertising)
   Similarity: 845/1000
   ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  "Show me the impact of that initiative"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;get_initiative_context(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"initiative_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Initiative:2026-03-09:SNS Ad Spring Collection Campaign"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metric_window_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response (excerpt):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Initiative Context&lt;/span&gt;

Title: SNS Ad Spring Collection Campaign
Execution Period: 2026-03-01 to 2026-03-31
Category: Marketing (Advertising)
Target Domain: acquisition

&lt;span class="gu"&gt;## Similar Initiatives (SIMILAR_TO)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Instagram Reels Ad Test (similarity: 0.82)
&lt;span class="p"&gt;-&lt;/span&gt; 1-Month Free Trial Campaign (similarity: 0.78)

&lt;span class="gu"&gt;## KPI Changes During Initiative (30-day window)&lt;/span&gt;
| Metric | Pre-avg | Post-avg | Change |
|--------|---------|----------|--------|
| new_regular | 50 | 60 | +20.0% |
| new_lite | 30 | 35 | +16.7% |
| monthly | 1,000 | 1,050 | +5.0% |

&lt;span class="gu"&gt;## Service Quality Metrics&lt;/span&gt;
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| avg_rating | 3.50 | 3.60 | +2.9% |

&lt;span class="gu"&gt;## UX Metrics&lt;/span&gt;
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| total_sessions | 10,000 | 12,000 | +20.0% |
| registration_completed | 100 | 130 | +30.0% |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is the power of the Week node design.&lt;/strong&gt; Identify the weeks an initiative was active, then automatically pull all metrics (KPIs, quality, UX, marketing, logistics) from those same weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Visualize new acquisition YoY with initiative overlay"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;compare_metrics(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metrics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"new_regular"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"new_lite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"new_monthly"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dateFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-10-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dateTo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-31"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"granularity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weekly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overlay_initiatives"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acquisition"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Time series data with acquisition-domain initiatives overlaid on the same timeframe. KPI spikes become instantly attributable to "that initiative's timing."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Build Pipeline: 9 Phases
&lt;/h2&gt;

&lt;p&gt;The graph is constructed in 9 phases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Initiative nodes + Category/Business/Team&lt;/td&gt;
&lt;td&gt;Initiative, Category, Business, Team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Daily KPIs (50 metrics)&lt;/td&gt;
&lt;td&gt;Metric → MetricGroup (10 groups)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Business KPIs + Departments&lt;/td&gt;
&lt;td&gt;Department → Metric (DEPT_TRACKS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Week nodes (shared anchors)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;HAS_DATA_AT + ACTIVE_DURING_WEEK + NEXT_WEEK&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Service quality metrics (~50)&lt;/td&gt;
&lt;td&gt;QualityMetric → Week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;UX metrics (~40)&lt;/td&gt;
&lt;td&gt;UxMetric → Week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Marketing channels (~100)&lt;/td&gt;
&lt;td&gt;MarketingChannel → Week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MetricDomain (semantic bridge)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6 domains + IN_DOMAIN + TARGETS_DOMAIN&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Logistics KPIs (~10 categories)&lt;/td&gt;
&lt;td&gt;LogiMetric → Week&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Phases 4 and 8 are the &lt;strong&gt;key design points&lt;/strong&gt;. Other phases simply "turn data into nodes" — these two "structuralize relationships that don't exist."&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Week Node Generation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Convert initiative execution period to ISO weeks, generate ACTIVE_DURING_WEEK edges&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;initiative&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;initiatives&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;weeks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getISOWeeksBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;executionStartDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;executionEndDate&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Cap at 52 weeks (guard against long-running initiatives)&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;week&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;weeks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;edge_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ACTIVE_DURING_WEEK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Week:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;week&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Generate HAS_DATA_AT edges for weeks that have metric data&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metricWeek&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;metricWeeks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;edge_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HAS_DATA_AT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Metric:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;metricWeek&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Week:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;metricWeek&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;week&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// NEXT_WEEK edges for time series traversal&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sortedWeeks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;allWeeks&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;sortedWeeks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;edge_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NEXT_WEEK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Week:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;sortedWeeks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Week:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;sortedWeeks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 8: MetricDomain Generation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Category → Domain (semantic mapping defined by humans)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CATEGORY_TO_DOMAINS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Marketing (Advertising)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;acquisition&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CRM / Retention&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retention&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Quality / Service Improvement&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;service_quality&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Operations Improvement&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;operations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;New Feature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ux&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;revenue&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Initiative → TARGETS_DOMAIN (main business only — limited to where KPI data exists)&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;initiative&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;initiatives&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;business&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;MAIN_BUSINESS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domains&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;CATEGORY_TO_DOMAINS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;edge_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TARGETS_DOMAIN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;initiative&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`MetricDomain:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Not a Dedicated Graph DB or OSS Libraries?
&lt;/h2&gt;

&lt;p&gt;We implemented the graph using &lt;strong&gt;BigQuery alone&lt;/strong&gt;, without Neo4j, Amazon Neptune, or OSS like Microsoft's GraphRAG.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not a dedicated graph DB?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Dedicated Graph DB&lt;/th&gt;
&lt;th&gt;BigQuery&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Graph traversal&lt;/td&gt;
&lt;td&gt;Fast (native)&lt;/td&gt;
&lt;td&gt;Fast enough (~10,000 node scale)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector search&lt;/td&gt;
&lt;td&gt;Requires separate service&lt;/td&gt;
&lt;td&gt;VECTOR_SEARCH built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time series analysis&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Native (window functions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operating cost&lt;/td&gt;
&lt;td&gt;Always-on instances&lt;/td&gt;
&lt;td&gt;Serverless (pay per query)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Joining other data&lt;/td&gt;
&lt;td&gt;ETL required&lt;/td&gt;
&lt;td&gt;Same project, instant JOIN&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For Biz Graph, "graph structure + time series analysis + vector search combined" matters more than "deep graph traversal." BigQuery handles all three in one engine.&lt;/p&gt;

&lt;p&gt;Additionally, BigQuery has announced &lt;a href="https://cloud.google.com/bigquery/docs/graph-overview" rel="noopener noreferrer"&gt;Graph capabilities&lt;/a&gt; — once GA, native graph queries on node/edge tables will be available. Currently we traverse with SQL JOINs, but we expect to migrate to faster, more intuitive queries in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not OSS libraries / SaaS?
&lt;/h3&gt;

&lt;p&gt;OSS like Microsoft GraphRAG and various Graph RAG SaaS products focus on &lt;strong&gt;automatically extracting entities and relationships from text documents&lt;/strong&gt;. Great for research papers or news articles, but not for our use case.&lt;/p&gt;

&lt;p&gt;The reason is simple: &lt;strong&gt;we need to design the graph structure itself&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The concept of Week nodes as "temporal anchors" doesn't exist in generic tools&lt;/li&gt;
&lt;li&gt;MetricDomain "semantic bridging" reflects our specific business structure&lt;/li&gt;
&lt;li&gt;The Initiative → Week → Metric indirect connection pattern won't emerge from LLM entity extraction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Generic tools "auto-generate graphs from text." What we needed was "design the graph schema ourselves and integrate heterogeneous data sources." Fundamentally different problems.&lt;/p&gt;

&lt;p&gt;Internal query example (&lt;code&gt;get_initiative_context&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Get weeks the initiative was active&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;active_weeks&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;week_id&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cortex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;biz_graph_edges&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;initiative_id&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;edge_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACTIVE_DURING_WEEK'&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="c1"&gt;-- Get metrics that have data in those same weeks&lt;/span&gt;
&lt;span class="n"&gt;co_occurring_metrics&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;metric_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;edge_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;week_id&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cortex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;biz_graph_edges&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
  &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;active_weeks&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;week_id&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;edge_type&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'HAS_DATA_AT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'HAS_QUALITY_DATA_AT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'HAS_UX_DATA_AT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'HAS_MARKETING_DATA_AT'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;co_occurring_metrics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Graph traversal and time series data retrieval complete in a single SQL query. With a dedicated graph DB, you'd need to pass traversal results to another service for time series queries — an extra hop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initiative Data Ingestion: Auto-Extraction from Meeting Slides
&lt;/h2&gt;

&lt;p&gt;Graph quality depends on source data quality. Initiative data comes from all-hands and group meeting slides.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All-hands&lt;/td&gt;
&lt;td&gt;pptx in Drive → Slides conversion → text extraction&lt;/td&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Group standups&lt;/td&gt;
&lt;td&gt;Google Slides (cumulative, latest week appended)&lt;/td&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Text is extracted from meeting slides and structured by AI into the initiative table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;InitiativeRow&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;meetingDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// Meeting date&lt;/span&gt;
  &lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// Source (all-hands / group standup etc.)&lt;/span&gt;
  &lt;span class="nl"&gt;business&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// Business unit&lt;/span&gt;
  &lt;span class="nl"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// Marketing (Ads), New Feature, ...&lt;/span&gt;
  &lt;span class="nl"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;// Initiative title&lt;/span&gt;
  &lt;span class="nl"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// Detailed description&lt;/span&gt;
  &lt;span class="nl"&gt;team&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// Executing team&lt;/span&gt;
  &lt;span class="nl"&gt;executionStartDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Execution start date&lt;/span&gt;
  &lt;span class="nl"&gt;executionEndDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// Execution end date&lt;/span&gt;
  &lt;span class="nl"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// JSON format numeric metrics&lt;/span&gt;
  &lt;span class="nl"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// planned / in_progress / retrospective&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critical: &lt;code&gt;executionStartDate&lt;/code&gt; / &lt;code&gt;executionEndDate&lt;/code&gt;. The meeting date (&lt;code&gt;meetingDate&lt;/code&gt;) differs from when the initiative actually runs. "We started the SNS campaign last week," reported on 3/9, means &lt;code&gt;executionStartDate&lt;/code&gt; is 3/1. This distinction is essential for accurate Week node connections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operating Cost
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vertex AI Embedding (weekly)&lt;/td&gt;
&lt;td&gt;~$0.05/run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code (initiative extraction)&lt;/td&gt;
&lt;td&gt;Within monthly plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BQ storage&lt;/td&gt;
&lt;td&gt;A few GB (negligible)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Run Jobs&lt;/td&gt;
&lt;td&gt;Nearly free (1x weekly + 1x daily)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Server&lt;/td&gt;
&lt;td&gt;Nearly free (Cloud Run min-instances=0)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;A few dollars per month&lt;/strong&gt; to maintain a 10,000-node, 71,000-edge graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison With Typical Knowledge Graphs
&lt;/h2&gt;

&lt;p&gt;Let's take a step back and see how this design differs from conventional approaches.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Typical Knowledge Graph&lt;/th&gt;
&lt;th&gt;Biz Graph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node design&lt;/td&gt;
&lt;td&gt;Entities mapped directly to nodes&lt;/td&gt;
&lt;td&gt;Deliberately designed temporal anchors ("Week")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge semantics&lt;/td&gt;
&lt;td&gt;Relationships described as-is&lt;/td&gt;
&lt;td&gt;Edge types encode data source classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intermediate nodes&lt;/td&gt;
&lt;td&gt;Taxonomies for classification&lt;/td&gt;
&lt;td&gt;MetricDomain as semantic bridge (structuralized tacit knowledge)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph construction&lt;/td&gt;
&lt;td&gt;Relationships extracted from existing data&lt;/td&gt;
&lt;td&gt;Deliberately designed graph from data with no inherent relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use case&lt;/td&gt;
&lt;td&gt;Primarily search and navigation&lt;/td&gt;
&lt;td&gt;Goes further into causal candidate exploration for initiative impact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Similarity search&lt;/td&gt;
&lt;td&gt;Text-based search&lt;/td&gt;
&lt;td&gt;Pre-computed SIMILAR_TO edges via Embedding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;In one sentence:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our DB Graph "made existing relationships discoverable." Biz Graph "designed and created relationships that didn't exist."&lt;/p&gt;

&lt;p&gt;The former is an analysis problem. The latter is a &lt;strong&gt;design problem&lt;/strong&gt; — designing the graph structure from scratch and integrating heterogeneous data sources (meeting slides, spreadsheets, BQ tables) into a single explorable structure. That's the essence of Biz Graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Graph RAG Over Flat RAG
&lt;/h2&gt;

&lt;p&gt;Let's revisit the "why Graph RAG?" question from the introduction.&lt;/p&gt;

&lt;p&gt;For initiative effectiveness analysis, consider what happens with standard vector search (flat RAG). Ask "What was the SNS campaign's impact?" — flat RAG returns text chunks similar to the initiative description. You get info about the initiative itself.&lt;/p&gt;

&lt;p&gt;But it won't return &lt;strong&gt;concurrent KPI changes&lt;/strong&gt;. It won't return &lt;strong&gt;results from past similar initiatives&lt;/strong&gt;. It won't return &lt;strong&gt;related domain metrics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These are information connected "through the graph," not by "text similarity." You can only reach them by traversing Week nodes. This "need to follow relationships" use case is exactly where Graph RAG has a clear advantage over flat RAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Honesty: Not Asserting Causation
&lt;/h2&gt;

&lt;p&gt;One thing I was conscious of in this design: &lt;strong&gt;not asserting causation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Many BI tools and AI analyses want to declare "this initiative impacted this KPI." But in reality, there's no such certainty. Multiple initiatives may have been running simultaneously, it could be seasonal, it could be external market changes.&lt;/p&gt;

&lt;p&gt;Week node indirect connections simply "lay out what happened in the same period." Causal judgment is left to human or AI reasoning. I believe this is a statistically honest approach.&lt;/p&gt;

&lt;p&gt;"A structure for discovering causal candidates" — not "a structure for asserting causation." This distinction matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations: The Designer's Tacit Knowledge Is the Bottleneck
&lt;/h2&gt;

&lt;p&gt;Let me be honest about the weaknesses of this approach.&lt;/p&gt;

&lt;p&gt;MetricDomain mappings ("Marketing Advertising → acquisition domain") are hardcoded by humans. If this design is wrong, the entire graph's exploration results are skewed.&lt;/p&gt;

&lt;p&gt;This is simultaneously the answer to "why build it yourself." Off-the-shelf graph tools can't reflect your business structure — which initiative categories relate to which metric groups. Structuralizing this tacit knowledge requires someone who knows the business.&lt;/p&gt;

&lt;p&gt;Going forward, we're considering having AI propose these mappings with humans reviewing them. Full automation is hard, but an "AI suggests, humans approve" workflow could reduce the maintenance cost of domain knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Turning business data into a graph is more of a &lt;strong&gt;design challenge&lt;/strong&gt; than a technical one.&lt;/p&gt;

&lt;p&gt;There's no FK between "initiatives" and "KPIs." No join key. But by deliberately designing two structures — &lt;strong&gt;temporal axis (Week nodes)&lt;/strong&gt; and &lt;strong&gt;semantic domains (MetricDomain)&lt;/strong&gt; — it becomes an explorable graph.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Week nodes&lt;/strong&gt;: Indirect connections via "same week" instead of direct initiative-metric edges. A structure for discovering causal candidates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MetricDomain&lt;/strong&gt;: Semantic bridge between initiative categories and metric groups. Structuralized tacit knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SIMILAR_TO&lt;/strong&gt;: Pre-computed similar initiatives via AI Embedding. Instant answers to "have we done this before?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, questions like "Did that initiative work?", "Find initiatives that drove acquisition", "Show metrics YoY with initiative overlay" — AI can now autonomously explore the graph to answer these.&lt;/p&gt;

&lt;p&gt;Graphs aren't something you "find" — they're something you &lt;strong&gt;design&lt;/strong&gt;. Especially for business data.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>bigquery</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How We Built an Automated Meeting Intelligence System with Google Meet, Slack, and RAG</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Sat, 11 Apr 2026 09:11:59 +0000</pubDate>
      <link>https://forem.com/ryantsuji/how-we-built-an-automated-meeting-intelligence-system-with-google-meet-slack-and-rag-42ln</link>
      <guid>https://forem.com/ryantsuji/how-we-built-an-automated-meeting-intelligence-system-with-google-meet-slack-and-rag-42ln</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at airCloset — a fashion subscription service based in Japan.&lt;/p&gt;

&lt;p&gt;In previous posts, I wrote about building a &lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;DB Graph MCP server&lt;/a&gt; that lets you query 991 database tables across 15 schemas with natural language, and a &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;suite of 17 MCP servers&lt;/a&gt; that opened our internal operations to AI.&lt;/p&gt;

&lt;p&gt;This time, it's not about MCP. It's about something more fundamental — &lt;strong&gt;turning meetings into a searchable knowledge base&lt;/strong&gt;. This is the system I've wanted to build first when thinking about digitizing our company's information assets.&lt;/p&gt;

&lt;p&gt;We built a system that &lt;strong&gt;automatically shares&lt;/strong&gt; Google Meet &lt;strong&gt;recordings and transcripts&lt;/strong&gt; to Slack channels, and makes past meeting content &lt;strong&gt;searchable with natural language&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Context Disappears the Moment a Meeting Ends
&lt;/h2&gt;

&lt;p&gt;Face-to-face communication is fast and dense. A decision that takes 30 minutes over text can happen in 5 minutes in a meeting. That's the biggest advantage of meetings.&lt;/p&gt;

&lt;p&gt;But the problem is that &lt;strong&gt;context starts disappearing the moment the meeting ends&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What did we decide in that meeting again?"&lt;/li&gt;
&lt;li&gt;"There's a recording but I don't have the energy to rewatch an hour-long video"&lt;/li&gt;
&lt;li&gt;"Where did I write those meeting notes?"&lt;/li&gt;
&lt;li&gt;"We keep having the same discussion over and over"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building a habit of writing meeting notes is one solution, but honestly, getting everyone to consistently write good notes is hard. Even when they do, the nuance of the conversation is lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meetings are a treasure trove of information, yet they're not being utilized.&lt;/strong&gt; That's a huge waste.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;We built a system that automates four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One-click Meet creation from Google Calendar&lt;/strong&gt; — A Chrome extension creates a Meet with recording, transcription, and notes all enabled by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Slack notification when a meeting ends&lt;/strong&gt; — Instant notification, followed by recording and transcript links minutes later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic permission granting&lt;/strong&gt; — Access is automatically given to Slack channel members, meeting participants, and Calendar invitees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG search over transcripts and screen shares&lt;/strong&gt; — Ask a Slack Bot "What was the release date we discussed last week?" and get an answer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  User Flow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Create a Meeting (~10 seconds)
&lt;/h3&gt;

&lt;p&gt;In Google Calendar's event editor, click the "AI Fassy Meet" button added by our Chrome extension.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs02kapo2k8irukwxkv05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs02kapo2k8irukwxkv05.png" alt="Chrome extension button in Google Calendar" width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The "AI Fassy Meet" button appears next to Google Meet's native video conferencing option&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Select the Slack channel where notifications should be sent. Previously selected channels appear at the top, followed by your most active channels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2mh1zmyjc1dcj833phe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2mh1zmyjc1dcj833phe.png" alt="Slack channel selection dialog" width="800" height="895"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Channel search and selection dialog, sorted by selection history and activity&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Click "Create Meet" and the Meet URL is automatically set on the Calendar event.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchgt5xfkus0memkj03nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchgt5xfkus0memkj03nz.png" alt="Setting Meet URL" width="800" height="824"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Meet URL is set on the event with recording, transcription, and notes all enabled by default. The "Use Gemini to create meeting notes" shown on screen is Google Meet's native feature — our system additionally integrates Gemini 3 Flash for independent transcription and screen share analysis&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recording, transcription, and meeting notes are all ON by default.&lt;/strong&gt; Users don't need to think about settings at all.&lt;/p&gt;

&lt;p&gt;The channel dropdown shows &lt;strong&gt;previously selected channels first&lt;/strong&gt;, then &lt;strong&gt;channels you're a member of, sorted by message activity&lt;/strong&gt;. For recurring meetings, last week's channel is always one click away.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Hold the Meeting
&lt;/h3&gt;

&lt;p&gt;Just have your meeting normally. Recording and transcription run automatically in the background.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Automatic Notification When the Meeting Ends
&lt;/h3&gt;

&lt;p&gt;When the meeting ends, an instant notification appears in the designated Slack channel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t6g66pfh8tmb9npmad7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t6g66pfh8tmb9npmad7.png" alt="Slack meeting ended notification" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few minutes later, a follow-up notification arrives in the thread with links to the recording and transcript. Channel members can view them immediately.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Search Past Meetings with Natural Language
&lt;/h3&gt;

&lt;p&gt;In the same thread, mention the Bot to ask about the meeting content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2no06nuqulc4xedo1c5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2no06nuqulc4xedo1c5.png" alt="Full thread flow — end notification → artifact notification → RAG search → answer" width="800" height="837"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Full thread flow: ①Meeting ended notification → ②Recording and transcript links → ③User asks "Give me a summary of this meeting" → ④Bot responds with a structured summary&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Bot searches past meeting transcripts, summarizes the relevant parts, and responds with source links. Screen-shared slides and code are also searchable.&lt;/p&gt;



&lt;p&gt;Now let's dive into the technical implementation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fnmwmuv1ovv0pc859yo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fnmwmuv1ovv0pc859yo.png" alt="System Overview" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The system consists of four components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Deployment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chrome Extension + meet-calendar API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Meet creation UI + backend API&lt;/td&gt;
&lt;td&gt;Chrome / Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;workspace-pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Workspace Events API subscription management&lt;/td&gt;
&lt;td&gt;Shared package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;meet-pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Core event processing: artifact storage, permissions, embedding generation&lt;/td&gt;
&lt;td&gt;Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Slack Bot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Meet creation + RAG search&lt;/td&gt;
&lt;td&gt;Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Shared domain logic (Space creation, Firestore operations, Drive access, caching) is extracted into a common package, reused by both the Chrome Extension API and the Slack Bot.&lt;/p&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Chrome Extension (Manifest V3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Cloud Run (Hono)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event Processing&lt;/td&gt;
&lt;td&gt;Cloud Pub/Sub → Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workspace Integration&lt;/td&gt;
&lt;td&gt;Meet REST API, Drive API, Workspace Events API, Calendar API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI/ML&lt;/td&gt;
&lt;td&gt;Vertex AI Embeddings (gemini-embedding-001), Gemini 3 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Stores&lt;/td&gt;
&lt;td&gt;Firestore, BigQuery, Cloud Storage, Upstash Redis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notifications&lt;/td&gt;
&lt;td&gt;Slack Block Kit API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Pulumi (TypeScript)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Deep Dive 1: Pre-Pooling Meet Spaces — LIFO Cache
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Problem: Meet Creation Is Slow
&lt;/h3&gt;

&lt;p&gt;Creating a new Google Meet Space via API takes 1–2 seconds for a response. Making users wait several seconds after clicking a button is an unacceptable UX.&lt;/p&gt;
&lt;h3&gt;
  
  
  Solution: Pre-Create and Pool
&lt;/h3&gt;

&lt;p&gt;The idea is simple: &lt;strong&gt;pre-create Meet Spaces via API and return them instantly on request&lt;/strong&gt;. Replenish in the background when consumed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kvsuralizku6heptnei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kvsuralizku6heptnei.png" alt="LIFO Cache" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MeetSpaceCache&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;cachePool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CachedMeetSpace&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;targetSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;maxSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;ttlMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 24 hours&lt;/span&gt;

  &lt;span class="nf"&gt;getMeetSpaceFromCache&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;CachedMeetSpace&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Filter expired entries, then pop the newest&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachePool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachePool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isExpired&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;space&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachePool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// LIFO&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;space&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;emitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spaceConsumed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Trigger background replenishment&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;space&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why LIFO?&lt;/strong&gt; By always returning the newest Space, we minimize the risk of serving an expired one. Older Spaces naturally expire and get filtered out on the next &lt;code&gt;pop()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Replenishment is event-driven via &lt;code&gt;EventEmitter&lt;/code&gt;. When a Space is consumed, &lt;code&gt;replenish()&lt;/code&gt; runs in the background after a 100ms delay. A mutex (&lt;code&gt;isReplenishing&lt;/code&gt; flag) prevents concurrent API requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;initializeMeetCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;createSpace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;emitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spaceConsumed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replenish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;createSpace&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="c1"&gt;// Build initial pool on startup&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replenish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;createSpace&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This brings most requests down to &lt;strong&gt;under 100ms latency&lt;/strong&gt; for returning a Meet URL. The cache lives in a shared domain package, reused by both the Chrome Extension API and the Slack Bot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 2: Designing for Adoption — Chrome Extension
&lt;/h2&gt;

&lt;h3&gt;
  
  
  We Started with a Slack Command
&lt;/h3&gt;

&lt;p&gt;The first thing we built was a &lt;strong&gt;&lt;code&gt;/meet&lt;/code&gt; command in Slack&lt;/strong&gt;. Mention the bot and it returns a Meet link. Technically, it worked perfectly.&lt;/p&gt;

&lt;p&gt;But &lt;strong&gt;nobody used it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why? The meeting creation flow is "create a Calendar event → invite participants → set the Meet URL." The Slack command is &lt;strong&gt;outside&lt;/strong&gt; this flow. Switching to Slack, typing a command, copying the URL, pasting it into Calendar — that's too much friction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Meet Users Where They Already Are
&lt;/h3&gt;

&lt;p&gt;The insight was that &lt;strong&gt;features must be placed on the user's existing path to get adopted&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Google Calendar's event editor is a place &lt;strong&gt;everyone passes through&lt;/strong&gt; when scheduling a meeting. Put a button there and it's one click. That's why we built a Chrome Extension.&lt;/p&gt;

&lt;p&gt;The Slack command still exists and some people use it. But adoption skyrocketed after shipping the Chrome Extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimizing Channel Selection
&lt;/h3&gt;

&lt;p&gt;We also put effort into the channel selection UX. The dropdown order is determined by the following logic:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Personal Selection History (Redis ZSET)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Store in Redis ZSET with score=timestamp&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;saveChannelSelection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Remove duplicate of same channel&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zrem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;existingMember&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Add with latest timestamp&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zadd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;member&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="c1"&gt;// Cap at 50 entries&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zremrangebyrank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;MAX_RECENT&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Previously selected channels appear at the top. For recurring meetings, last week's channel is always first. Using Redis ZSET with timestamps as scores gives O(log N) insertion and natural chronological ordering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: Channel Activity (Firestore &lt;code&gt;sortPriority&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Channels without selection history are sorted by a pre-computed &lt;code&gt;sortPriority&lt;/code&gt; (based on message volume) in Firestore. Frequently used channels rank higher.&lt;/p&gt;

&lt;p&gt;Both sources are fetched in parallel, with Redis results taking priority in the merge, ensuring a useful list even on first load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 3: Domain-Wide Delegation — Why a "Proxy Account" Is Needed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The File Ownership Problem
&lt;/h3&gt;

&lt;p&gt;When you enable recording in Google Meet, the recording and transcript files are created in &lt;strong&gt;the organizer's personal Drive&lt;/strong&gt;. This is a Google Workspace behavior that cannot be changed.&lt;/p&gt;

&lt;p&gt;This is a major problem.&lt;/p&gt;

&lt;p&gt;When files are scattered across different organizers' Drives, &lt;strong&gt;the system cannot uniformly access them&lt;/strong&gt;. Copying recordings to GCS, loading transcripts into BQ, granting permissions to channel members — all these automated operations require reliable file access. If the organizer differs each time, you'd have to track which Drive the file is in and manage each person's OAuth tokens. This is operationally untenable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Impersonation via a Shared Service Account
&lt;/h3&gt;

&lt;p&gt;We use Domain-Wide Delegation (DWD) to have a &lt;strong&gt;service account act as a Workspace admin&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JWT&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;serviceAccountEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Service account&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;privateKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.googleapis.com/auth/meetings.space.created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.googleapis.com/auth/drive&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;workspaceAdminEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Act as this admin&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since APIs execute as the Workspace admin specified in &lt;code&gt;subject&lt;/code&gt;, both Meet Space creation and Drive file ownership are consolidated under this shared account.&lt;/p&gt;

&lt;p&gt;When creating a Space, we set recording and transcription to &lt;strong&gt;ON by default&lt;/strong&gt; via &lt;code&gt;artifactConfig&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;accessType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TRUSTED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;entryPointAccess&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ALL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;artifactConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;recordingConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;autoRecordingGeneration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ON&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Recording: ON by default&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;transcriptionConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;autoTranscriptionGeneration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ON&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Transcription: ON by default&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Users never "forget to turn on recording." Every Meet created through this system is guaranteed to be recorded and transcribed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Files are always consolidated in the same account's Drive → uniform system access&lt;/li&gt;
&lt;li&gt;No individual OAuth token management needed&lt;/li&gt;
&lt;li&gt;Same credentials work regardless of who organizes the meeting&lt;/li&gt;
&lt;li&gt;One-time setup in Workspace Admin Console, then it just works with the service account key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workspace Admin privileges are required&lt;/strong&gt; for the initial setup, but it's a one-time task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calendar Search via DWD
&lt;/h3&gt;

&lt;p&gt;When notifying Slack on meeting end, we need the &lt;strong&gt;meeting title&lt;/strong&gt;. But the Meet API doesn't provide it — the title only exists on the Calendar side.&lt;/p&gt;

&lt;p&gt;DWD helps here too. We first search the organizer's Calendar, then iterate through participants' Calendars.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;searchCalendarEventTitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;creatorEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;participants&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 1. Search the organizer's calendar first&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;creatorEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;searchCalendar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creatorEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetCode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creatorEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;creatorEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Fall back to participants&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;participant&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;participants&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;searchCalendar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;participant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetCode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Fall back to Firestore cache&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;calendarTitle&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With DWD, you can search any user's Calendar by simply swapping the &lt;code&gt;subject&lt;/code&gt;. No Calendar sharing settings needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 4: Workspace Events API — Real-Time Event-Driven Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  No Polling
&lt;/h3&gt;

&lt;p&gt;"How do we detect when a Meet ends?" — this was the first challenge.&lt;/p&gt;

&lt;p&gt;Polling the API for status checks lacks real-time responsiveness and increases API call volume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Workspace Events API&lt;/strong&gt; lets you receive Meet lifecycle events in real-time via Pub/Sub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subscription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;workspaceEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subscriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;requestBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;targetResource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`//meet.googleapis.com/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;eventTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.conference.v2.ended&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Meeting ended&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.recording.v2.fileGenerated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Recording ready&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.transcript.v2.fileGenerated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Transcript ready&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;notificationEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pubsubTopic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`projects/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/topics/meet-events`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;payloadOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;includeResource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We create a Subscription when the Meet Space is created, delivering three event types to the &lt;code&gt;meet-events&lt;/code&gt; Pub/Sub topic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fighting the 7-Day Expiration
&lt;/h3&gt;

&lt;p&gt;However, these Subscriptions have a &lt;strong&gt;7-day maximum TTL&lt;/strong&gt; (604,800 seconds). This is a Google API constraint that cannot be changed. Left unattended, subscriptions expire and events stop arriving.&lt;/p&gt;

&lt;p&gt;This becomes a problem in cases like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Recurring meetings&lt;/strong&gt; — A weekly Monday standup reuses the same Meet Space. The subscription expires before next Monday&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future meetings&lt;/strong&gt; — Creating a Meet in advance for next week's 1:1. If more than 7 days pass from creation, events won't arrive on the meeting day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, &lt;strong&gt;without automatic subscription renewal, recurring and future meetings won't work&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Daily Batch Auto-Renewal
&lt;/h3&gt;

&lt;p&gt;We run a daily batch via Cloud Scheduler at 5:00 AM JST, processing in two phases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;renewSubscriptions&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RenewalResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Phase 1: Invalidate old Spaces (run before renewal)&lt;/span&gt;
  &lt;span class="c1"&gt;// → Processing invalidations first excludes them from Phase 2&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;spacesToInvalidate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getMeetSpacesNeedingInvalidation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;thirtyDaysAgo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;space&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;spacesToInvalidate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;invalidateMeetSpace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// isValid = false&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Phase 2: Renew Subscriptions&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;spacesToRenew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getMeetSpacesNeedingRenewal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sixDaysAgo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;space&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;spacesToRenew&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Create new Subscription (old one auto-expires)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newSubscriptionName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createMeetSubscription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subscriptionConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateMeetSpaceSubscription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newSubscriptionName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 1: Invalidation&lt;/strong&gt; — Spaces where &lt;code&gt;meetingEndAt&lt;/code&gt; is over 30 days ago are set to &lt;code&gt;isValid: false&lt;/code&gt;. After 30 days since a meeting ended, no recording or transcript events will arrive. Invalidation excludes them from Phase 2, reducing unnecessary API calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Renewal&lt;/strong&gt; — Spaces where &lt;code&gt;subscribedAt&lt;/code&gt; is 6+ days ago (one day before expiration) get a new Subscription. Old subscriptions auto-expire, so explicit deletion is unnecessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subscription Lifecycle
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Day 0: Meet created → Subscription created (TTL: 7 days)
Day 6: Daily batch → Subscription renewed (new TTL: 7 days)
Day 12: Daily batch → Subscription renewed (new TTL: 7 days)
  ...repeats...
Day 30+: Daily batch → isValid=false → renewal stops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this mechanism, &lt;strong&gt;even if you create a Meet today for a meeting next month, the subscription is auto-renewed daily so events are guaranteed to arrive on the meeting day&lt;/strong&gt;. Recurring meetings similarly work across multiple weeks with the same Meet Space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 5: Event Processing Pipeline
&lt;/h2&gt;

&lt;p&gt;From meeting end to Slack notification to vector data generation for RAG search — everything starts from receiving a Pub/Sub message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw947ms5n8ckswu4lrsn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw947ms5n8ckswu4lrsn.png" alt="Event Pipeline" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Event Router: Dispatching to Three Handlers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleMeetEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;eventType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ce-type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;spaceName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;normalizeSpaceName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ce-subject&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="c1"&gt;// Fetch space info from Firestore&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getMeetSpaceInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.conference.v2.ended&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleMeetEnded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.recording.v2.fileGenerated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleRecordingGenerated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google.workspace.meet.transcript.v2.fileGenerated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleTranscriptGenerated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pubsubMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One caveat: the Pub/Sub event's &lt;code&gt;targetResource&lt;/code&gt; may contain a &lt;code&gt;conferenceRecordId&lt;/code&gt; instead of a &lt;code&gt;spaceName&lt;/code&gt;. Google Meet creates a new conference record for each session in the same Space. In that case, we resolve &lt;code&gt;conferenceRecordId → spaceName&lt;/code&gt; via the Meet API.&lt;/p&gt;

&lt;h3&gt;
  
  
  ① handleMeetEnded — On Meeting End
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Update Firestore status to &lt;code&gt;ended&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fetch participant list from Meet API&lt;/li&gt;
&lt;li&gt;Search Calendar API for the meeting title (DWD to search participants' calendars)&lt;/li&gt;
&lt;li&gt;Save participant info to BQ (making "who attended" searchable via RAG)&lt;/li&gt;
&lt;li&gt;Send "meeting ended" notification to Slack&lt;/li&gt;
&lt;li&gt;Save notification &lt;code&gt;ts&lt;/code&gt; (timestamp) to Firestore → subsequent notifications thread under it&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ② handleRecordingGenerated — On Recording Completion
&lt;/h3&gt;

&lt;p&gt;The recording handler is the most complex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Drive → GCS copy → Grant permissions → Update Firestore
                 → Gemini transcription (async)
                 → Screen share analysis (async)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Idempotency is critical.&lt;/strong&gt; Pub/Sub guarantees at-least-once delivery, so duplicate messages are possible. We strictly maintain this order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleRecordingGenerated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Idempotency check: skip if already processed&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recordingReady&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;recording&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// 1. Get file info from Drive&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getFileInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;driveFileId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Stream copy to GCS (with existence check)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;gcsFileExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsPath&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;copyDriveFileToGCS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gcsPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Grant permissions to channel members ← BEFORE setting the flag&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;shareFileWithChannelMembers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channelId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 4. Save artifact info to Firestore&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateMeetSpaceArtifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;recording&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;driveFileId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gcsUri&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// 5. AI processing is async fire-and-forget&lt;/span&gt;
  &lt;span class="nf"&gt;processGeminiTranscription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logError&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;processScreenShareAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logError&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 6. Check if both are ready → send Slack notification if so&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkAndNotifyArtifacts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;spaceName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why grant permissions before setting the flag?&lt;/strong&gt; If the flag is set first, a retry would skip via the idempotency check, and permissions would never be granted. Drive permission granting is idempotent (HTTP 400 means permission already exists), so it's safe to execute multiple times.&lt;/p&gt;

&lt;h3&gt;
  
  
  ③ handleTranscriptGenerated — On Transcript Completion
&lt;/h3&gt;

&lt;p&gt;Structurally mirrors the recording handler. Extracts the Google Docs transcript as text, saves to GCS, then feeds into the embedding pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Both Are Ready: Final Notification + Calendar Attachment
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;checkAndNotifyArtifacts()&lt;/code&gt; executes when both recording and transcript are Ready:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send artifact notification to Slack&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Attach recording and transcript files to the Calendar event&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Grant permissions to Calendar invitees&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Point 2 is key. Normally, Google Meet automatically attaches files to the Calendar event when recording and transcription complete. In our system, DWD creates the Meet under a different account, so that auto-attachment doesn't work. We &lt;strong&gt;explicitly attach files via the Calendar API to preserve the same experience as default Meet&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;attachFilesToCalendarEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;attachments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recording&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;fileUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recording&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webViewLink&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Recording&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;fileUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webViewLink&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Transcript&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Deduplicate by fileUrl to be idempotent&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attachments&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newAttachments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fileUrl&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fileUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;calendar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;calendarId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;organizerEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;requestBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;newAttachments&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;supportsAttachments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets users access recordings and transcripts directly from the Calendar event detail view — whether they come via Slack or Calendar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 6: Three-Layer Permission Model
&lt;/h2&gt;

&lt;p&gt;"Who gets access?" is the most delicate design point. Too narrow and it's useless; too broad and it's a security risk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm0onv0fgowbpl3tcvpg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm0onv0fgowbpl3tcvpg.png" alt="Permission Model" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Slack Channel Members
&lt;/h3&gt;

&lt;p&gt;When each artifact is generated, all members of the linked Slack channel get Drive viewer access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;shareFileWithChannelMembers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;channelId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Enumerate channel members via Slack API&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getChannelMembers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channelId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;members&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Slack ID → Firestore → email&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getUserInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;member&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;userInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@air-closet.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Domain filter&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;organizerSlackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;writer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;reader&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;shareFileWithUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Importantly, &lt;strong&gt;members who join the channel later also get access&lt;/strong&gt;. Since permissions are granted using the latest member list on each Pub/Sub retry, people who joined after the meeting naturally receive access.&lt;/p&gt;

&lt;p&gt;The organizer gets &lt;code&gt;writer&lt;/code&gt; permissions, allowing them to manage the recording file (rename, change sharing settings, etc.).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Meeting Participants
&lt;/h3&gt;

&lt;p&gt;On meeting end, participant info from the Meet API is saved to BQ. Participants may be guests not in the Slack channel, requiring a separate permission axis from Layer 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Calendar Invitees
&lt;/h3&gt;

&lt;p&gt;When both artifacts are ready, permissions are also granted to Calendar event invitees.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;attachToCalendarAndShareWithAttendees&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getCalendarEventByMeetCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meetingCode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Attach files to the Calendar event&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;attachFilesToCalendarEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Grant permissions to all invitees (organizer = writer, others = reader)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attendees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;shareFilesWithEmails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;organizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;People not in the Slack channel but on the Calendar invite (e.g., a manager who only wants to review meeting notes) also get access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Guarantees
&lt;/h3&gt;

&lt;p&gt;Common security rules apply across all three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Domain filter&lt;/strong&gt;: Only &lt;code&gt;@air-closet.com&lt;/code&gt; email addresses are eligible. Prevents sharing with external users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent permission grants&lt;/strong&gt;: HTTP 400 (permission already exists) is not treated as an error&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification suppression&lt;/strong&gt;: &lt;code&gt;sendNotificationEmail: false&lt;/code&gt; prevents a flood of "X shared a file with you" emails&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deep Dive 7: Embedding Generation &amp;amp; RAG Search Pipeline
&lt;/h2&gt;

&lt;p&gt;This was the most exciting part to build.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4kqnzeiz4o09lf06dfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4kqnzeiz4o09lf06dfn.png" alt="RAG Pipeline" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Content Sources
&lt;/h3&gt;

&lt;p&gt;Up to three types of text are extracted from each meeting and vectorized separately:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Content Type&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;transcript&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Google Meet's native transcript (Google Docs)&lt;/td&gt;
&lt;td&gt;Spoken word text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini_transcript&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemini-generated transcript from the recording&lt;/td&gt;
&lt;td&gt;Higher quality than native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;screen_share&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemini Vision-extracted screen share content&lt;/td&gt;
&lt;td&gt;Slides, code, documents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Text Chunking: Bilingual Sentence Boundary Detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Find a sentence boundary to avoid cutting mid-sentence&lt;/span&gt;
      &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;findSentenceBreak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Overlap preserves context across chunks&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;findSentenceBreak()&lt;/code&gt; searches backward from the chunk boundary for sentence-ending punctuation. It supports both Japanese (&lt;code&gt;。&lt;/code&gt;, &lt;code&gt;！&lt;/code&gt;, &lt;code&gt;？&lt;/code&gt;) and English (&lt;code&gt;.&lt;/code&gt;, &lt;code&gt;!&lt;/code&gt;, &lt;code&gt;?&lt;/code&gt;), with fallback to spaces and fullwidth spaces. A minimum of 100 characters per chunk is enforced.&lt;/p&gt;

&lt;p&gt;Meeting transcripts frequently mix Japanese and English, making bilingual boundary detection essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Screen Share Content Extraction with Gemini
&lt;/h3&gt;

&lt;p&gt;Transcripts alone miss &lt;strong&gt;content shown via screen sharing&lt;/strong&gt; — slides, code, documents. When you need to find "that thing on the slide," it's not searchable.&lt;/p&gt;

&lt;p&gt;We use Gemini 3 Flash (&lt;code&gt;gemini-3-flash-preview&lt;/code&gt;) multimodal input to extract screen share content directly from the recording video.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;analyzeScreenShareFromVideo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gemini&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GEMINI_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// gemini-3-flash-preview&lt;/span&gt;
    &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;fileData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;video/mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fileUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;gcsUri&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;// Unlike transcription, video frames matter here — higher fps&lt;/span&gt;
        &lt;span class="na"&gt;videoMetadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;fps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Extract the content shown via screen sharing in this video.
               Transcribe any slide text, document content,
               or code that appears.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;generationConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fps differentiation is key.&lt;/strong&gt; For transcription, only audio matters, so &lt;code&gt;fps: 0.1&lt;/code&gt; (1 frame per 10 seconds) minimizes video tokens. For screen share analysis, visual content matters, so &lt;code&gt;fps: 0.2&lt;/code&gt; (1 frame per 5 seconds).&lt;/p&gt;

&lt;p&gt;For long meetings that hit the input token limit, an automatic fallback splits the video into 30-minute chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;transcribeFromVideo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Try processing the full video first&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callGemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isTokenLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Token limit hit → split into 30-minute chunks&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribeVideoInChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gcsUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  BigQuery Vector Search
&lt;/h3&gt;

&lt;p&gt;Vector data is stored in per-channel BQ tables (&lt;code&gt;meet_{channelId}&lt;/code&gt;). Splitting tables by channel enables filter-free Vector Search for within-channel queries. A separate aggregated table with &lt;code&gt;channel_id&lt;/code&gt; clustering handles cross-channel search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;insertMeetChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channelTableId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`meet_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;meetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channelId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Auto-create table if it doesn't exist (day-partitioned)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ensureMeetChannelTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channelTableId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;insertRow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channelTableId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Access Control at Search Time
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;meetingId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channelId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ML&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`meet_chunks`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;channelId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;accessible_channels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- Access control&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;@accessible_channels&lt;/code&gt; is &lt;strong&gt;the list of Slack channel IDs the user is a member of&lt;/strong&gt;. Meeting content from channels you're not in will never appear in results, even if it exists in BQ.&lt;/p&gt;

&lt;p&gt;COSINE distance is converted to a 0–1 relevance score via &lt;code&gt;1 - distance / 2&lt;/code&gt;. Only chunks above the threshold are fed into Gemini's context to generate the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 8: GCS Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Streaming Copy from Drive to GCS
&lt;/h3&gt;

&lt;p&gt;Recording files can be hundreds of MBs. Loading everything into memory would exhaust Cloud Run's memory, so we stream downloads directly into uploads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;copyDriveFileToGCS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;driveFileId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gcsPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Stream download from Drive API&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`https://www.googleapis.com/drive/v3/files/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;driveFileId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?alt=media`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Stream upload to GCS JSON API&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`https://storage.googleapis.com/upload/storage/v1/b/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/o?name=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;gcsPath&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;uploadType=media`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mimeType&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Pass ReadableStream directly&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; We use the GCS JSON API directly instead of &lt;code&gt;@google-cloud/storage&lt;/code&gt;'s &lt;code&gt;file.save()&lt;/code&gt; because the latter has a bug where multipart boundary strings get mixed into binary data during upload, corrupting recording files.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  GCS File Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gs://bucket/
└── meet/
    └── {channelId}/
        └── {spaceId}/
            ├── recording.mp4              # Recording file
            ├── transcript_original.txt    # Google Docs transcript
            ├── gemini_transcript.txt      # Gemini transcript
            └── screen_share.txt           # Screen share analysis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The channelId → spaceId hierarchy makes per-channel data management and lifecycle policy application straightforward. GCS lifecycle auto-deletes after 90 days (originals remain on Drive).&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive 9: Slack Notification Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Two-Phase Notification
&lt;/h3&gt;

&lt;p&gt;To avoid making users wait, we split notifications into two phases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 (immediately after meeting end):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🎬 Meeting ended

"Weekly Standup" has ended.
We'll notify you when the recording and transcript are ready.

Created by: @tanaka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the recording and transcript are still processing. But users can confirm that the meeting was successfully recorded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 (after artifacts are ready — thread reply):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📹 Recording and transcript are ready!

🎥 Recording
   https://drive.google.com/file/d/xxx

📝 Transcript
   https://docs.google.com/document/d/xxx

ℹ️ Channel members have viewing access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Phase 2 is sent as a &lt;strong&gt;thread reply&lt;/strong&gt; to Phase 1. The Phase 1 message's &lt;code&gt;ts&lt;/code&gt; (timestamp) is saved to Firestore and used as the thread parent for Phase 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: OpenTelemetry + Grafana + Prometheus
&lt;/h2&gt;

&lt;p&gt;All processing in this system is instrumented with &lt;strong&gt;OpenTelemetry&lt;/strong&gt; and aggregated in &lt;strong&gt;Grafana&lt;/strong&gt;. Meet Space creation, Pub/Sub event processing, Drive→GCS copy, embedding generation, Slack notifications — latency and error rates for each step are visible on a single dashboard.&lt;/p&gt;

&lt;p&gt;Through the Grafana MCP introduced in the &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;previous article&lt;/a&gt;, these logs and metrics are also accessible via MCP. Investigations like "Show me error logs from yesterday's Meet pipeline" can be done directly from Claude Code.&lt;/p&gt;

&lt;p&gt;For Gemini API costs, we track actual usage and costs via &lt;strong&gt;Prometheus&lt;/strong&gt;. Token consumption for transcription and screen share analysis is visualized in real-time, so cost anomalies are caught immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond: Meeting Data as a Project Knowledge Base
&lt;/h2&gt;

&lt;p&gt;The system described so far is about "sharing and searching meeting recordings and transcripts." But this data is already being leveraged in a broader context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project-Level Meeting Data Integration
&lt;/h3&gt;

&lt;p&gt;At airCloset, Slack channels are created per project. The mapping between channels and projects is managed in Firestore, and through our Project Management MCP (described in the &lt;a href="https://dev.to/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2"&gt;previous article&lt;/a&gt;), &lt;strong&gt;meeting data linked to a project is searchable via MCP&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, "Tell me what was discussed about this spec in Project X's past meetings" searches all meeting transcripts from that project's Slack channel and returns relevant excerpts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified Search with Slack Messages
&lt;/h3&gt;

&lt;p&gt;Beyond meeting transcripts, &lt;strong&gt;Slack messages themselves are also stored and vectorized in BigQuery&lt;/strong&gt; using the same approach. The same MCP can search across both meeting content and Slack discussions.&lt;/p&gt;

&lt;p&gt;What was decided in a meeting and how it was implemented in Slack afterward. Conversely, what was debated in Slack and which meeting made the final call. &lt;strong&gt;Being able to search across meetings and chat as two unified communication channels&lt;/strong&gt; is remarkably powerful in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploring Code Review Integration
&lt;/h3&gt;

&lt;p&gt;We're currently exploring whether &lt;strong&gt;business context from meeting and Slack data could be used for specification checks during code reviews&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If we could automatically surface meeting decisions and Slack spec discussions related to code changes in a PR, and verify "Is this change consistent with the spec decided in the meeting on date X?" during review, we might be able to prevent bugs caused by misunderstood requirements. It's still in the conceptual stage, but the potential for meeting data utilization continues to expand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary: Maximizing Meeting Value
&lt;/h2&gt;

&lt;p&gt;Here's what this system achieves:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Effort of writing meeting notes&lt;/td&gt;
&lt;td&gt;Auto-transcribed and auto-shared&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effort of rewatching recordings&lt;/td&gt;
&lt;td&gt;Ask in natural language, get a summary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effort of managing permissions&lt;/td&gt;
&lt;td&gt;Auto-granted to channel members, participants, and invitees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effort of creating Meets&lt;/td&gt;
&lt;td&gt;One click from the Chrome extension&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What was that thing we discussed?"&lt;/td&gt;
&lt;td&gt;Instantly found via RAG search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screen-shared content not preserved&lt;/td&gt;
&lt;td&gt;Auto-extracted by Gemini Vision&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Technical highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LIFO cache&lt;/strong&gt; bringing Meet Space creation to under 100ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chrome Extension&lt;/strong&gt; placing features on users' existing workflow, dramatically boosting adoption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain-Wide Delegation&lt;/strong&gt; solving the file ownership problem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workspace Events API&lt;/strong&gt; + daily batch covering the 7-day TTL constraint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent event processing&lt;/strong&gt; handling Pub/Sub's at-least-once delivery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three-layer permission model&lt;/strong&gt; ensuring access for all stakeholders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-channel table strategy&lt;/strong&gt; enabling both scoped and cross-channel search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Vision fps differentiation&lt;/strong&gt; optimizing transcription and screen share analysis costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Meetings are a treasure trove of information. Letting that information sleep is a waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Workspace × GCP × Slack&lt;/strong&gt; — maximizing the value of every meeting. I hope this helps anyone facing similar challenges.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/workspace/events" rel="noopener noreferrer"&gt;Google Workspace Events API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/meet/api/reference/rest" rel="noopener noreferrer"&gt;Google Meet REST API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/identity/protocols/oauth2/service-account#delegatingauthority" rel="noopener noreferrer"&gt;Domain-Wide Delegation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings" rel="noopener noreferrer"&gt;Vertex AI Embeddings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/bigquery/docs/vector-search" rel="noopener noreferrer"&gt;BigQuery Vector Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.chrome.com/docs/extensions/develop" rel="noopener noreferrer"&gt;Chrome Extension Manifest V3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>gcp</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>We Built 17 MCP Servers to Let AI Run Our Internal Operations</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Tue, 07 Apr 2026 16:22:59 +0000</pubDate>
      <link>https://forem.com/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2</link>
      <guid>https://forem.com/ryantsuji/we-built-17-mcp-servers-to-let-ai-run-our-internal-operations-3lk2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In a previous article, I introduced "DB Graph MCP" — a system that enables safe, cross-schema search and query execution across our entire database estate of 17 DBs and 994 tables.&lt;/p&gt;

&lt;p&gt;/posts/db-graph-mcp&lt;/p&gt;

&lt;p&gt;Thanks to the positive response, this time I'd like to introduce &lt;strong&gt;the rest of our MCP server fleet&lt;/strong&gt; beyond DB Graph.&lt;/p&gt;

&lt;p&gt;These were all built in roughly 3 months starting January 2026. We now have &lt;strong&gt;17 MCP servers&lt;/strong&gt; in production, covering databases, infrastructure, documentation, project management, observability, CI/CD, and even code editing and deployment by non-engineers — making virtually every aspect of our operations accessible to AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Here's the full lineup:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DB Graph&lt;/td&gt;
&lt;td&gt;Company-wide DB dictionary + query execution (&lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;previous article&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GCloud&lt;/td&gt;
&lt;td&gt;GCP resources, read-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;AWS resources, read-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docs &amp;amp; Knowledge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GWS&lt;/td&gt;
&lt;td&gt;Full Google Workspace access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Git Server&lt;/td&gt;
&lt;td&gt;All Git repos, read-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code Graph&lt;/td&gt;
&lt;td&gt;Codebase analysis (function → API → DB → event dependency tracking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Product Graph&lt;/td&gt;
&lt;td&gt;Unified knowledge graph: code + DB + docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Biz Graph&lt;/td&gt;
&lt;td&gt;Business initiative × KPI relationship graph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;Logs, metrics, and alert inspection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CircleCI&lt;/td&gt;
&lt;td&gt;Pipeline execution, build logs, test results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Project Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Project Management&lt;/td&gt;
&lt;td&gt;BQ/Firestore/Sheets-integrated PM support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Domain-Specific&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stylist Insights&lt;/td&gt;
&lt;td&gt;Stylist performance &amp;amp; KPI data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;UX Insights&lt;/td&gt;
&lt;td&gt;UX analytics from BQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;freee&lt;/td&gt;
&lt;td&gt;Accounting API integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dev Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Workspace&lt;/td&gt;
&lt;td&gt;ACL-gated monorepo editing &amp;amp; deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Sandbox&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;App deployment for non-engineers&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All servers are implemented in &lt;strong&gt;TypeScript&lt;/strong&gt;, deployed to &lt;strong&gt;GCP via Pulumi&lt;/strong&gt;, and authenticated with &lt;strong&gt;Google OAuth&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Philosophy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why So Many Servers?
&lt;/h3&gt;

&lt;p&gt;We could have built one monolithic MCP server, but we deliberately split them. Here's why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auth scope isolation&lt;/strong&gt; — GWS needs Workspace API scopes; the DB query server doesn't. Minimizing scopes prevents privilege escalation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy independence&lt;/strong&gt; — A Grafana server change doesn't affect DB queries. Blast radius stays small.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-user selection&lt;/strong&gt; — Engineers add everything; marketing adds only GWS. Just put what you need in &lt;code&gt;.mcp.json&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Shared Foundation
&lt;/h3&gt;

&lt;p&gt;Every server shares common patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auth&lt;/strong&gt;: A shared package implements Google OAuth 2.0 + PKCE with RFC 8414 auto-discovery. Just add the URL to &lt;code&gt;.mcp.json&lt;/code&gt; and Claude Code handles the auth flow automatically. For business users, we simply register them as custom connectors in the Claude organization settings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"server-name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mcp-xxx.your-domain.example/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No &lt;code&gt;auth&lt;/code&gt; block needed. Same format for every server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session management&lt;/strong&gt;: Upstash Redis as a shared session store across all servers. SSO cookies mean one login grants access to everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool usage logging&lt;/strong&gt;: Every tool invocation is recorded in BigQuery. Who used what, when — fully auditable. We monitor usage rates, error rates, and usage patterns to drive improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure: GCloud / AWS
&lt;/h2&gt;

&lt;p&gt;Have you ever wanted to let AI investigate your cloud environment? And simultaneously thought: &lt;strong&gt;"Is it safe to let it do that?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In my case, I have admin-level privileges, which makes it even scarier. So I built &lt;strong&gt;MCP servers that are physically incapable of writing anything&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Two key design decisions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OIDC / STS / Impersonate for secure auth&lt;/strong&gt; — Zero persistent credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-account audit logging&lt;/strong&gt; — Individual email addresses recorded in GCP Audit Log / CloudTrail&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  GCloud MCP
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → MCP Server → gcloud CLI subprocess → GCP APIs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Runs &lt;code&gt;gcloud&lt;/code&gt; CLI on Cloud Run. The key point: &lt;strong&gt;writes are made impossible at the OAuth scope level&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth scope: &lt;code&gt;cloud-platform.read-only&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;GCP APIs check &lt;strong&gt;both&lt;/strong&gt; scope and IAM — even admin users cannot write&lt;/li&gt;
&lt;li&gt;GCP Audit Log records the user's email address&lt;/li&gt;
&lt;li&gt;Account revocation on departure: just disable the Google Workspace account
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What you can do&lt;/span&gt;
"Show me the Cloud Run services in prod"
"Check the env vars for this service"
"List the Secret Manager secrets"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS MCP
&lt;/h3&gt;

&lt;p&gt;Same philosophy, but AWS can't accept Google OAuth directly, so we use STS as a bridge.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → MCP Server → GCP metadata → ID Token
                         → AWS STS AssumeRoleWithWebIdentity → temp credentials
                         → aws CLI subprocess → AWS APIs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Two layers of safety&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;IAM Role with &lt;code&gt;ReadOnlyAccess&lt;/code&gt; policy only&lt;/li&gt;
&lt;li&gt;Temporary credentials with 1-hour expiry&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Supports multiple AWS accounts via &lt;code&gt;profile&lt;/code&gt; parameter. CloudTrail records &lt;code&gt;assumed-role/mcp-aws-readonly/user@example.com&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docs &amp;amp; Knowledge: GWS / Git Server
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GWS (Google Workspace) MCP
&lt;/h3&gt;

&lt;p&gt;Operate &lt;strong&gt;all Google Workspace services&lt;/strong&gt; from Claude Code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → MCP Server → gws CLI subprocess → Google Workspace APIs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Runs &lt;a href="https://github.com/nicholasgasior/gws" rel="noopener noreferrer"&gt;gws CLI&lt;/a&gt; remotely, passing the user's OAuth access token directly. &lt;strong&gt;Each user accesses resources with their own permissions&lt;/strong&gt; — you can see your Drive but not someone else's.&lt;/p&gt;

&lt;p&gt;Since OAuth authentication and Google Workspace authorization happen simultaneously, &lt;strong&gt;the moment you connect to the MCP you have immediate access to your Workspace resources&lt;/strong&gt;. No additional login or token setup required — the experience is seamless.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What you can do&lt;/span&gt;
"Summarize the sales data in this spreadsheet"
"Extract meeting notes from last week's calendar"
"Summarize this document"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Git Server MCP
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;read-only&lt;/strong&gt; server for all company Git repositories.&lt;/p&gt;

&lt;p&gt;The motivation: &lt;strong&gt;bypassing GitHub MCP rate limits&lt;/strong&gt;. GitHub's official MCP server hits the GitHub API under the hood, and the rate limit kicks in surprisingly fast when AI is investigating a codebase.&lt;/p&gt;

&lt;p&gt;Git Server MCP keeps main-branch clones of all repos on a GCE VM, operating via &lt;strong&gt;local git commands with zero rate limiting&lt;/strong&gt;. Query as much as you want.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_blame&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Last change commit per line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_log&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Commit history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_grep&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cross-repo text search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_show&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Commit details&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git_diff&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Diff between commits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read file contents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_files&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List directory contents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_repos&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search repositories&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No GitHub account needed — OAuth authentication is sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: Grafana MCP
&lt;/h2&gt;

&lt;p&gt;The official &lt;code&gt;mcp/grafana&lt;/code&gt; Docker image deployed on Cloud Run, with an OAuth proxy in front.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → OAuth Proxy → mcp-grafana → Grafana Cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supports PromQL/LogQL queries, dashboard inspection, and alert rule review.&lt;/p&gt;

&lt;p&gt;What's important is that Grafana dashboards and alert rules are also defined in the same repository as &lt;strong&gt;Pulumi (TypeScript)&lt;/strong&gt;. This means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write application code&lt;/li&gt;
&lt;li&gt;Define alert rules in the same repo&lt;/li&gt;
&lt;li&gt;Alert fires in production&lt;/li&gt;
&lt;li&gt;Claude Code reads logs via Grafana MCP&lt;/li&gt;
&lt;li&gt;Fix the code in the same repo&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;strong&gt;code → infra → observability → investigation → fix&lt;/strong&gt; loop is completely closed.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD: CircleCI MCP
&lt;/h2&gt;

&lt;p&gt;Integrates with CircleCI API v2. A shared CircleCI token sits behind Google SSO, so the whole team uses it without managing tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → OAuth Proxy → CircleCI MCP (sidecar) → CircleCI API v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Run multi-container setup: the official &lt;code&gt;@circleci/mcp-server-circleci&lt;/code&gt; runs as a sidecar, with our OAuth proxy in front.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What you can do&lt;/span&gt;
"What's the status of the latest pipeline on main?"
"Show me the failure logs for this build"
"Find flaky tests"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Project Management MCP
&lt;/h2&gt;

&lt;p&gt;A server for managing issues in Firestore and semantically searching Slack/Meet conversations.&lt;/p&gt;

&lt;p&gt;Key capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue management&lt;/strong&gt;: Create, update status, and list Issues in Firestore (with spreadsheet dual-write)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context search&lt;/strong&gt;: &lt;strong&gt;Vector search + Gemini summarization&lt;/strong&gt; across Meet notes and Slack conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project overview&lt;/strong&gt;: View milestones, members, design docs, and test cases for your projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backlog integration&lt;/strong&gt;: Retrieve ticket parent-child relationships via BQ&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Domain-Specific
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stylist Insights / UX Insights MCP
&lt;/h3&gt;

&lt;p&gt;Servers providing access to stylist performance/KPI data and UX analytics, respectively. Query interfaces over BQ aggregate tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  freee MCP
&lt;/h3&gt;

&lt;p&gt;An OAuth-authenticated proxy to the freee API for accounting data access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dev Platform: Workspace / Sandbox
&lt;/h2&gt;

&lt;p&gt;This might be the most unique part.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workspace MCP — Code Editing Without a GitHub Account
&lt;/h3&gt;

&lt;p&gt;Provides &lt;strong&gt;ACL-gated file editing, commits, PR creation, and deployment&lt;/strong&gt; for our internal monorepo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No GitHub account required&lt;/strong&gt;. Only a Google Workspace account (OAuth) is needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. workspace_init          → Create worktree, initialize branch
2. workspace_write_file    → Edit code
3. workspace_diff          → Review changes
4. workspace_commit        → Commit
5. workspace_push          → Push to GitHub
6. workspace_deploy        → Deploy from feature branch (test)
7. Verify it works
8. workspace_create_pr     → Request review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access control is managed in Firestore. Admins configure &lt;strong&gt;which stacks (directories) each user can edit and deploy&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"allowedPaths"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"apps/web/xxx/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"apps/api/xxx/"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"allowedStacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"api-xxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pages-xxx"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"developer"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Non-engineers can &lt;strong&gt;safely edit and deploy only the stacks they're authorized for&lt;/strong&gt;. In practice, a non-engineer team member is already using AI + Workspace MCP to improve a full-scratch KPI dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox MCP — App Deployment for Non-Engineers
&lt;/h3&gt;

&lt;p&gt;Going even further: &lt;strong&gt;non-engineers can deploy their own apps for internal use&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. sandbox_init_repo(app_name: "my-tool")    → Initialize repo
2. sandbox_write_file(...)                    → Write files
3. sandbox_publish(app_name: "my-tool")       → Deploy to Cloud Run
   → https://sbx-{nickname}--my-tool.example.com/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No gcloud, no Docker. Just tell Claude "I want a tool that does X" and it's published on an internal URL.&lt;/p&gt;

&lt;p&gt;Deployed apps are protected by &lt;strong&gt;Cloudflare Access with Google Workspace authentication&lt;/strong&gt;, so only internal members can access them. Even though they're on the public internet, access from outside the organization is impossible.&lt;/p&gt;

&lt;p&gt;I wrote &lt;a href="https://dev.to/ryantsuji/bridging-i-want-to-build-and-i-want-to-publish-safely-for-non-engineers-sandbox-mcp-392a"&gt;detail article&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Servers: Code Graph / Product Graph / Biz Graph
&lt;/h2&gt;

&lt;p&gt;A family of servers that analyze codebases and business logic as graph structures.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Key Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DB Graph&lt;/td&gt;
&lt;td&gt;Company-wide DBs (&lt;a href="https://dev.to/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5"&gt;previous article&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Table dictionary + semantic search + live DB queries + PII anonymization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code Graph&lt;/td&gt;
&lt;td&gt;All source code (cross-repository)&lt;/td&gt;
&lt;td&gt;Static analysis tracking function → API → DB → event dependencies across repos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product Graph&lt;/td&gt;
&lt;td&gt;Internal monorepo&lt;/td&gt;
&lt;td&gt;Unified knowledge graph of code + DB + docs. Every node has business context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Biz Graph&lt;/td&gt;
&lt;td&gt;Business initiatives &amp;amp; metrics&lt;/td&gt;
&lt;td&gt;Initiative × metric relationship graph&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each has a different design philosophy and solves different problems. See the previous article for DB Graph; details on the others are coming in future posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Model
&lt;/h2&gt;

&lt;p&gt;Here's the security approach shared across all servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defense in Depth
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1: Google Workspace OAuth + domain restriction
  → Organization domain only. External users cannot log in.

Layer 2: SSO + session management
  → Upstash Redis, 7-day TTL, sliding window

Layer 3: Per-server scope restrictions
  → GCloud: cloud-platform.read-only
  → AWS: ReadOnlyAccess policy
  → DB Graph: SELECT only + PII anonymization

Layer 4: Data-level protection
  → Automatic PII anonymization (40+ column patterns)
  → Confidential datasets controlled by BQ IAM
  → Production DBs via read replicas only

Layer 5: Audit logging
  → All tool invocations recorded in BQ
  → Individual email in GCP Audit Log / CloudTrail
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automatic Revocation on Departure
&lt;/h3&gt;

&lt;p&gt;Since every server depends on Google OAuth, &lt;strong&gt;disabling a Google Workspace account instantly revokes access to all MCP servers&lt;/strong&gt;. No individual token revocation or account cleanup needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;Lessons learned from building and operating our MCP server fleet:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Centralize authentication&lt;/strong&gt;&lt;br&gt;
Building OAuth as a shared package made adding new servers dramatically easier. Auth code per server is about 10 lines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Start read-only&lt;/strong&gt;&lt;br&gt;
GCloud, AWS, and Git Server are all read-only. Allow reads first; add writes only when truly needed. This keeps security discussions simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Wrap existing tools&lt;/strong&gt;&lt;br&gt;
gcloud CLI, aws CLI, gws CLI, CircleCI MCP — put existing CLIs and MCP servers behind an OAuth proxy and the whole team can use them safely. No need to build from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Non-engineer access is the most exciting frontier&lt;/strong&gt;&lt;br&gt;
Workspace MCP and Sandbox MCP provide the foundation for non-engineers to edit code and deploy without a GitHub account. It's still early and the big wins are ahead, but this is where the most potential lies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Keep everything in one repository&lt;/strong&gt;&lt;br&gt;
Application code, infrastructure (Pulumi), observability (Grafana alert rules), MCP servers — all in a single monorepo. This closes the loop: write code → deploy → monitor → find issues → fix.&lt;/p&gt;




&lt;p&gt;In the DB Graph article, I described the problem of "how tables relate to each other existing only in specific people's heads." Looking at the full MCP server fleet, it's clear this isn't limited to databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure state, code dependencies, document contents, project progress, user behavior logs&lt;/strong&gt; — all of these were trapped in people's heads. Eliminating that is the essential role of our MCP server fleet.&lt;/p&gt;

&lt;p&gt;Externalizing knowledge into a form that AI can access. That's the common theme across all our MCP servers.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gcp</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Democratizing Internal Data — Building an MCP Server That Lets You Search 991 Tables in Natural Language</title>
      <dc:creator>Ryosuke Tsuji</dc:creator>
      <pubDate>Wed, 25 Mar 2026 18:15:40 +0000</pubDate>
      <link>https://forem.com/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5</link>
      <guid>https://forem.com/ryantsuji/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5</guid>
      <description>&lt;p&gt;Hi, I'm &lt;a href="https://x.com/ryantsuji" rel="noopener noreferrer"&gt;Ryan&lt;/a&gt;, CTO at &lt;a href="https://www.air-closet.com/" rel="noopener noreferrer"&gt;airCloset&lt;/a&gt; — Japan's leading fashion rental subscription service.&lt;/p&gt;

&lt;p&gt;Today I want to share something I'm genuinely proud of: &lt;strong&gt;DB Graph&lt;/strong&gt; and &lt;strong&gt;DB Graph MCP&lt;/strong&gt; — a Model Context Protocol (MCP) server that lets anyone in our company search and query &lt;strong&gt;15 schemas, 991 tables, 11 SQL databases, and 6 MongoDB instances&lt;/strong&gt; using natural language through Claude Code.&lt;/p&gt;

&lt;p&gt;You don't need to know a single table name. Ask "find tables related to returns" and it gives you the answer — across schemas, across database engines. And yes, it can query production data safely.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through everything: what it does, how it works, the tool design, actual response formats, how we built the graph, how we operate it, and how we handle permissions and security.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Nobody Knows All 991 Tables
&lt;/h2&gt;

&lt;p&gt;airCloset has been running since 2015 — that's 10 years of accumulated database schema.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL Databases&lt;/td&gt;
&lt;td&gt;11 (MySQL 8 + PostgreSQL 3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MongoDB Databases&lt;/td&gt;
&lt;td&gt;6 (DocumentDB 5 + Atlas 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schemas&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tables/Collections&lt;/td&gt;
&lt;td&gt;991&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORMs&lt;/td&gt;
&lt;td&gt;4 (TypeORM, Sequelize, Drizzle, Mongoose)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repositories&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Nobody in the company knows all of them. Not even close.&lt;/p&gt;

&lt;p&gt;Here's a real scenario. Customer support asks: "This customer's app shows the return as completed, but has the warehouse actually confirmed receiving it?"&lt;/p&gt;

&lt;p&gt;Think about what you need to investigate this.&lt;/p&gt;

&lt;p&gt;The app-side return status lives in the &lt;code&gt;aircloset&lt;/code&gt; schema's delivery order table. If the delivery status is "RETURNED", the app considers it done. Some people might know this much.&lt;/p&gt;

&lt;p&gt;But the &lt;strong&gt;warehouse-side confirmation&lt;/strong&gt; lives in the &lt;code&gt;bridge&lt;/code&gt; schema. A receive record table's status being "COMPLETE" means the warehouse has physically processed the returned package.&lt;/p&gt;

&lt;p&gt;The problem? These two live in &lt;strong&gt;completely separate databases&lt;/strong&gt;. No foreign key connects them. To bridge the gap, there's an intermediate mapping table in &lt;code&gt;aircloset&lt;/code&gt; that holds a warehouse order code (varchar) — which corresponds to a shipping order code in &lt;code&gt;bridge&lt;/code&gt;. No FK, just a varchar match across schemas.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aircloset delivery order table (status = RETURNED)
  ↓ order_id
aircloset warehouse mapping table
  ↓ warehouse_order_code (varchar)
bridge shipping order table (matched by code — no FK!)
  ↓ shipping_order_id
bridge receive record table (status = COMPLETE = warehouse confirmed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Table names are generalized for this article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Four tables, two schemas, a foreign-key-less varchar join. &lt;strong&gt;How many people in the company know this path?&lt;/strong&gt; You could count them on one hand. And if they're on vacation, the investigation stalls.&lt;/p&gt;

&lt;p&gt;This is daily life in a 991-table × 15-schema world. It's not just "I don't know the table name." It's that &lt;strong&gt;the connections between schemas exist only in specific people's heads&lt;/strong&gt;. That was the real problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  DB Graph MCP — The Big Picture
&lt;/h2&gt;

&lt;p&gt;This is what we built to solve it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qg0z178jr1k1f0uu584.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qg0z178jr1k1f0uu584.png" alt="System Overview" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Four components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;DB Dictionary Graph Builder&lt;/strong&gt; — A daily batch job that parses ORM definitions from 28 repositories and stores table/column/relationship info as a graph in BigQuery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB Dictionary Review UI&lt;/strong&gt; — A web app where humans verify AI-generated descriptions, mark deprecated columns, and add annotations. Review data survives daily rebuilds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB Graph MCP Server&lt;/strong&gt; — An MCP server (Cloud Run) that combines graph search with live DB querying&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB Account Pipeline&lt;/strong&gt; — Fully automated DB access provisioning: application → approval → account creation → notification&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Seeing It in Action
&lt;/h2&gt;

&lt;p&gt;Let's solve the return investigation from above using DB Graph MCP.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tool response examples below use generalized table/column names. The response format reflects actual output.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 1: Natural Language Table Search
&lt;/h3&gt;

&lt;p&gt;Ask Claude Code: "Find tables related to return processing confirmation." Under the hood, &lt;code&gt;search_tables&lt;/code&gt; runs a semantic search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; search_tables(query: "return processing confirmation", search_type: "semantic")

5 tables found (by vector similarity):

bridge.return_packages (postgresql) (distance: 0.2557)
bridge.receive_records (postgresql) (distance: 0.2720)
cella.receive_confirmation_results (mysql) (distance: 0.2921)
bridge.receive_record_details (postgresql) (distance: 0.2951)
aircloset.return_status_change_histories (mysql) (distance: 0.3170)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single search returns tables across &lt;strong&gt;three schemas (bridge, cella, aircloset)&lt;/strong&gt;. The table name "receive_records" doesn't contain the word "return" — but the AI-generated description includes "rental return processing" and "warehouse receiving", so it matches semantically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Table Detail
&lt;/h3&gt;

&lt;p&gt;The second hit in &lt;code&gt;bridge&lt;/code&gt; looks promising. Let's get the details.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gt"&gt;&amp;gt; get_table_detail(table_name: "bridge.receive_records")&lt;/span&gt;

&lt;span class="gh"&gt;# bridge.receive_records&lt;/span&gt;
DB: POSTGRESQL / ORM: typeorm / Repository: bridge-api

&lt;span class="gu"&gt;## Columns (9)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; id: int [PK, AI, NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; code: varchar [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; shipping_order_id: varchar [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; status: enum [NOT NULL, default=IN_PROGRESS]
&lt;span class="p"&gt;-&lt;/span&gt; type: enum [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; receive_datetime: varchar [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; operated_by: varchar [NOT NULL]
&lt;span class="p"&gt;-&lt;/span&gt; created_at / updated_at: datetime

&lt;span class="gu"&gt;## References (2)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; shipping_order_id → bridge.shipping_orders.id (explicit)
&lt;span class="p"&gt;-&lt;/span&gt; operated_by → bridge.users.id (explicit)

&lt;span class="gu"&gt;## Referenced By (1)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; bridge.receive_record_details.record_id → id (explicit)

&lt;span class="gu"&gt;## Enum Definitions (2)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Status: COMPLETE=Received, IN_PROGRESS=Processing
&lt;span class="p"&gt;-&lt;/span&gt; Type: RENTAL_RETURN=Rental return, BUSINESS_RETURN=Business return,
        RENTAL_RETURN_LACK=Rental return (missing items), BUSINESS_RETURN_LACK=Business return (missing items)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;status = COMPLETE&lt;/code&gt; means "the warehouse has finished receiving."&lt;/strong&gt; Exactly what we needed. Plus &lt;code&gt;type = RENTAL_RETURN&lt;/code&gt; distinguishes rental returns from business returns. Enum definitions with human-readable labels — visible at a glance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Discovering the Cross-Schema Path
&lt;/h3&gt;

&lt;p&gt;Now the question: how do we connect the &lt;code&gt;aircloset&lt;/code&gt; delivery order (app side) to the &lt;code&gt;bridge&lt;/code&gt; receive record (warehouse side)? Let's use &lt;code&gt;trace_relationships&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; trace_relationships(table_name: "bridge.shipping_orders", direction: "both", max_depth: 1)

# Relationship trace: bridge.shipping_orders
Nodes: 23, Edges: 22

## Relationships (excerpt)
- shipping_orders.shop_id → shops.id (explicit)
- shipping_orders.warehouse_id → warehouses.id (explicit)
- receive_records.shipping_order_id → shipping_orders.id (explicit)     ← warehouse confirmation!
- return_packages.shipping_order_id → shipping_orders.id (explicit)     ← return shipment
- shipping_packages.shipping_order_id → shipping_orders.id (explicit)   ← outbound shipment
- shipping_inspections.shipping_order_id → shipping_orders.id (explicit) ← inspection
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Found the path from &lt;code&gt;bridge.shipping_orders&lt;/code&gt; to &lt;code&gt;receive_records&lt;/code&gt;. Next, we find the mapping table connecting &lt;code&gt;aircloset&lt;/code&gt; and &lt;code&gt;bridge&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; search_tables(query: "warehouse_mapping", search_type: "table", adjacent_depth: 1)

aircloset.warehouse_shipping_relations (mysql)

### Related Tables
  → aircloset.delivery_orders (order_id → id)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; get_table_detail(table_name: "aircloset.warehouse_shipping_relations")

## Columns (4)
- order_id: int [PK, NOT NULL]              ← aircloset delivery order ID
- warehouse_order_code: varchar [NOT NULL]   ← bridge shipping order code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Found it.&lt;/strong&gt; &lt;code&gt;order_id&lt;/code&gt; links to the aircloset side, &lt;code&gt;warehouse_order_code&lt;/code&gt; links to the bridge side. No FK, but this varchar is the only key connecting two schemas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Querying Real Data
&lt;/h3&gt;

&lt;p&gt;Now we build cross-schema queries. First, get the delivery order and warehouse code from &lt;code&gt;aircloset&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gt"&gt;&amp;gt; sql_query_database(database: "aircloset", sql: "SELECT ... WHERE user_id = 12345 AND status = 'RETURNED'")&lt;/span&gt;

&lt;span class="gs"&gt;**aircloset**&lt;/span&gt; (staging) — 1 row

| id     | status   | returned_date       | warehouse_order_code |
|--------|----------|---------------------|----------------------|
| 98765  | RETURNED | 2026-03-20 10:30:00 | SO-2026-00012345     |
&lt;span class="gt"&gt;
&amp;gt; **Table**: Manages the full lifecycle of delivery orders — styling → shipping → return status tracking&lt;/span&gt;

&lt;span class="gu"&gt;### Column Descriptions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**status**&lt;/span&gt;: Delivery status (1=Awaiting shipment, 2=Ready, 3=Delivered, 4=Returned, 5=Cancelled)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**returned_date**&lt;/span&gt;: Date/time the warehouse received the customer's return
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**warehouse_order_code**&lt;/span&gt;: Mapping code to bridge shipping order

&lt;span class="gu"&gt;### Related Tables&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**aircloset.users**&lt;/span&gt; (user_id → id): Customer profile...
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**aircloset.plans**&lt;/span&gt; (plan_id → id): Subscription plan definitions...
&lt;span class="p"&gt;-&lt;/span&gt; ← &lt;span class="gs"&gt;**aircloset.styling_feedbacks**&lt;/span&gt; (delivery_id → id): Customer feedback on styling...
&lt;span class="p"&gt;-&lt;/span&gt; ← &lt;span class="gs"&gt;**aircloset.rental_items**&lt;/span&gt; (delivery_id → id): Items in this order...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that &lt;strong&gt;column descriptions and related tables are automatically appended below the query result&lt;/strong&gt;. This metadata is pulled from the graph data cached in Redis (cache-invalidated on graph updates). AI can read this enrichment to determine its next step — like "use the warehouse code to query &lt;code&gt;bridge&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;Now check the warehouse side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gt"&gt;&amp;gt; sql_query_database(database: "bridge", sql: "SELECT ... WHERE code = 'SO-2026-00012345'")&lt;/span&gt;

&lt;span class="gs"&gt;**bridge**&lt;/span&gt; (staging) — 1 row

| code             | status  | receive_status | type          | receive_datetime    |
|------------------|---------|---------------|---------------|---------------------|
| SO-2026-00012345 | SHIPPED | COMPLETE      | RENTAL_RETURN | 2026-03-21 14:22:00 |
&lt;span class="gt"&gt;
&amp;gt; **Table**: Records warehouse receiving operations — arrival confirmation and inspection status&lt;/span&gt;

&lt;span class="gu"&gt;### Column Descriptions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**status**&lt;/span&gt;: Shipping order status (ORDERED→ALLOCATED→PICKED→INSPECTED→SHIPPED→CANCELED)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**receive_status**&lt;/span&gt;: Receive status (IN_PROGRESS=Processing, COMPLETE=Received)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**type**&lt;/span&gt;: Receive type (RENTAL_RETURN=Rental return, BUSINESS_RETURN=Business return)

&lt;span class="gu"&gt;### Related Tables&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**bridge.warehouses**&lt;/span&gt; (warehouse_id → id): Source warehouse...
&lt;span class="p"&gt;-&lt;/span&gt; → &lt;span class="gs"&gt;**bridge.shops**&lt;/span&gt; (shop_id → id): Source shop...
&lt;span class="p"&gt;-&lt;/span&gt; ← &lt;span class="gs"&gt;**bridge.receive_record_details**&lt;/span&gt; (record_id → id): Individual item details...
&lt;span class="p"&gt;-&lt;/span&gt; ← &lt;span class="gs"&gt;**bridge.shipping_packages**&lt;/span&gt; (order_id → id): Outbound package info...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;receive_status = COMPLETE&lt;/code&gt; — the warehouse has confirmed receipt.&lt;/strong&gt; Both the app-side return status and the warehouse-side physical confirmation are verified.&lt;/p&gt;

&lt;p&gt;This enrichment is the key to AI-powered investigation. Claude Code reads the column descriptions and related tables to autonomously decide "what to query next" and "how to interpret these values." No human guidance needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beyond Operations: Cross-Service Analytics
&lt;/h3&gt;

&lt;p&gt;This isn't limited to operational investigations. &lt;strong&gt;It works for business analytics too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try asking Claude Code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How many customers used our spot rental service last week, what percentage of them are airCloset monthly subscribers, and how frequently do those subscribers use the main service?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Answering this requires crossing the spot rental order table (&lt;code&gt;spot_rental&lt;/code&gt; schema) with the main service's member and usage tables (&lt;code&gt;aircloset&lt;/code&gt; schema).&lt;/p&gt;

&lt;p&gt;Claude Code uses DB Graph MCP to identify the relevant tables via &lt;code&gt;search_tables&lt;/code&gt;, discover join keys via &lt;code&gt;trace_relationships&lt;/code&gt;, and run queries against both databases to produce the aggregated result. &lt;strong&gt;Cross-service analytics from a single natural language question&lt;/strong&gt; — that's the core value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Without DB Graph MCP
&lt;/h3&gt;

&lt;p&gt;Imagine doing these investigations without any tooling:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Return confirmation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You need to know the delivery order table exists in &lt;code&gt;aircloset&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;You need to know about the warehouse mapping table that bridges schemas&lt;/li&gt;
&lt;li&gt;You need to know that a varchar warehouse code maps to &lt;code&gt;bridge&lt;/code&gt;'s shipping code&lt;/li&gt;
&lt;li&gt;You need to know that &lt;code&gt;bridge&lt;/code&gt;'s receive record table is the warehouse confirmation&lt;/li&gt;
&lt;li&gt;You need to know what enum values like COMPLETE and RENTAL_RETURN mean&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Cross-service analytics:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You need to know the spot rental DB schema name and table structure&lt;/li&gt;
&lt;li&gt;You need to know the join key to the main service's member table&lt;/li&gt;
&lt;li&gt;You need connection credentials for both databases&lt;/li&gt;
&lt;li&gt;You need to correctly interpret member statuses and usage counts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In both cases, the required knowledge spans multiple services and schemas. Probably fewer than five people hold all of it in their heads. With DB Graph MCP, &lt;strong&gt;anyone can get there&lt;/strong&gt; through natural language search → table detail → relationship tracing → live queries.&lt;/p&gt;

&lt;p&gt;Now let's dive into &lt;em&gt;how&lt;/em&gt; this works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Design: 7 Tools in 3 Categories
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dictionary Tools (no DB credentials required)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_tables&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Name search + vector similarity search across tables/columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_table_detail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full table info: columns, FKs, enums, DEAD annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trace_relationships&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BFS traversal of table relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dictionary tools read pre-built graph data from BigQuery — &lt;strong&gt;no individual DB credentials needed&lt;/strong&gt;. Anyone with a Google OAuth login can use them immediately, with no access request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Tools (DB credentials required)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_databases&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List databases you have access to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sql_query_database&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute SELECT queries against MySQL/PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;describe_database_table&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get live schema from actual DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mongo_query_database&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute find/aggregate against DocumentDB/Atlas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Query tools use per-user credentials stored in Firestore. You only see databases you've been granted access to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This separation is intentional.&lt;/strong&gt; The dictionary is open to everyone; data access is permission-controlled. "Everyone should know what tables exist, but accessing the data requires authorization."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why BigQuery? — Technology Choices
&lt;/h2&gt;

&lt;p&gt;We use BigQuery as the graph store. "Shouldn't a graph DB use Neo4j?" you might ask.&lt;/p&gt;

&lt;p&gt;We chose BigQuery because &lt;strong&gt;one store handles graph + vector search + analytics&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VECTOR_SEARCH&lt;/strong&gt;: Store 768-dimensional embeddings and run cosine similarity search natively. No separate vector DB needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph traversal&lt;/strong&gt;: Node + edge table design enables BFS traversal through simple recursive JOINs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON type&lt;/strong&gt;: &lt;code&gt;JSON_SET&lt;/code&gt; on a properties column lets us flexibly append review data without schema changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless&lt;/strong&gt;: No instance management. Pay only for queries, not idle time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vertex AI integration&lt;/strong&gt;: Gemini 3 Flash for description generation and embedding models connect seamlessly within GCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Workspace integration&lt;/strong&gt;: OAuth uses Google Accounts directly. Domain restriction, nickname resolution, and permission management all flow through the same identity — no separate IdP needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A dedicated graph DB like Neo4j has superior traversal performance, but at 991 tables, BigQuery is more than sufficient. The operational simplicity of "vector search, JSON, analytics, and graph all in one place" far outweighs the performance difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Natural Language Search Works
&lt;/h2&gt;

&lt;p&gt;How does "return processing confirmation" find a receive records table?&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Generate Table Descriptions
&lt;/h3&gt;

&lt;p&gt;The DB Dictionary Graph Builder runs daily at 6:00 AM JST, generating AI descriptions for each table using Gemini 3 Flash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: bridge.receive_records
→ "Records warehouse receiving operations. Tracks rental returns
   and business returns with completion/in-progress status.
   Links to shipping orders to trace which order a return belongs to."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Generate Embeddings
&lt;/h3&gt;

&lt;p&gt;Each description is converted to a 768-dimensional vector using Vertex AI's embedding model and stored in BigQuery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: VECTOR_SEARCH
&lt;/h3&gt;

&lt;p&gt;The user's query is also converted to a 768-dimensional vector, then matched via BigQuery's &lt;code&gt;VECTOR_SEARCH&lt;/code&gt; using cosine distance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qualifiedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;VECTOR_SEARCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`project.db_graph_nodes`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'embedding'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;distance_type&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Table'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if "return" doesn't appear in the table name, the AI description's mention of "rental return processing" places it close in vector space. That's the core of natural language search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Graph
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6-Phase Pipeline
&lt;/h3&gt;

&lt;p&gt;The builder runs six phases daily:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qg0z178jr1k1f0uu584.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qg0z178jr1k1f0uu584.png" alt="System Overview" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;(See the Builder section of the diagram)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;① ORM Parsing&lt;/strong&gt; — Parse 4 ORM types (TypeORM, Sequelize, Drizzle, Mongoose) across 28 repositories to extract table definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;② Live DB Validation&lt;/strong&gt; — Query actual staging DBs via Lambda to compare code definitions against real schemas. Auto-exclude tables that exist in code but not in the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;③ AI Description&lt;/strong&gt; — Generate table/column descriptions with Gemini 3 Flash. Incremental detection regenerates only changed tables to minimize AI cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;④ Graph Construction&lt;/strong&gt; — Generate 4 node types (Schema/Table/Column/Enum) and 5 edge types (HAS_TABLE/HAS_COLUMN/REFERENCES/USES_ENUM/SAME_ENTITY).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑤ Embedding Generation&lt;/strong&gt; — Generate 768-dimensional vectors per table via Vertex AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑥ BQ MERGE&lt;/strong&gt; — Load into BigQuery using MERGE, &lt;strong&gt;preserving human-written descriptions and DEAD flags&lt;/strong&gt;. Auto-generated data never overwrites manual annotations.&lt;/p&gt;
&lt;h3&gt;
  
  
  Relationship Confidence Levels
&lt;/h3&gt;

&lt;p&gt;Foreign key detection has varying confidence:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Confidence&lt;/th&gt;
&lt;th&gt;Detection Method&lt;/th&gt;
&lt;th&gt;Reliability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;explicit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Directly from ORM &lt;code&gt;@JoinColumn()&lt;/code&gt; or &lt;code&gt;belongsTo()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Certain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inferred&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Naming convention: &lt;code&gt;xxx_id&lt;/code&gt; → &lt;code&gt;xxx&lt;/code&gt; table&lt;/td&gt;
&lt;td&gt;High probability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;manual&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Added by human reviewers&lt;/td&gt;
&lt;td&gt;Certain&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This lets AI judge the reliability of suggested JOIN conditions before using them.&lt;/p&gt;
&lt;h3&gt;
  
  
  SAME_ENTITY Edges
&lt;/h3&gt;

&lt;p&gt;The same logical entity sometimes exists in both SQL and MongoDB — for example, a MySQL users table and a MongoDB user statistics collection both represent the same user. &lt;code&gt;SAME_ENTITY&lt;/code&gt; edges express these cross-engine correspondences, enabling seamless cross-database discovery.&lt;/p&gt;
&lt;h2&gt;
  
  
  Human Review: AI Alone Isn't Enough
&lt;/h2&gt;

&lt;p&gt;"Are AI-generated descriptions actually accurate?" Honestly — not always.&lt;/p&gt;

&lt;p&gt;Gemini 3 Flash produces decent high-level descriptions, but 10 years of business context — "this column was migrated 3 years ago but never dropped from the schema", "enum value 5 is actually never used" — that kind of tacit knowledge can't be filled by AI alone.&lt;/p&gt;

&lt;p&gt;That's why we built &lt;strong&gt;human review into the system from day one&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Review Web UI
&lt;/h3&gt;

&lt;p&gt;We have a dedicated review web app for the DB Dictionary.&lt;/p&gt;

&lt;p&gt;The schema list shows review progress bars. The table list supports filtering by "unchecked", "checked", and "has deprecated items."&lt;/p&gt;

&lt;p&gt;The table detail screen displays columns with type badges, FK targets, and enum definitions — with inline editing for descriptions and deprecation flags.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv096pqpvndnm06yk53sv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv096pqpvndnm06yk53sv.png" alt="Review UI — Table Detail" width="800" height="642"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Review UI: FK targets and enum definitions shown as badges. Descriptions can be edited inline.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Available review actions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit table description&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supplement or rewrite the AI-generated description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit column description&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-column annotations ("deprecated", "use XX instead", etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mark as DEAD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deprecation flag + reason + empty percentage, at table or column level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mark as Checked&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Review completion flag — records who checked and when&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bulk DEAD marking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mark up to 500 tables/columns as deprecated at once&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  DEAD Flags: Surfacing 10 Years of Tacit Knowledge
&lt;/h3&gt;

&lt;p&gt;After 10 years, deprecated columns accumulate. A flag that once represented member type — migrated years ago, now NULL in every row — still sits in the schema.&lt;/p&gt;

&lt;p&gt;When a reviewer marks a column as deprecated, the MCP table detail shows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- old_member_flag: int [NOT NULL, default=0, DEAD] ⚠ Deprecated. Use membership_status instead
- cancel_date: datetime [DEAD] ⚠ All rows NULL
- legacy_import_id: varchar [DEAD] ⚠ Legacy CSV import field. No longer used
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters because &lt;strong&gt;it prevents AI from writing code that references the wrong column&lt;/strong&gt;. When Claude Code loads table details into context and sees a DEAD flag, it knows to avoid that column.&lt;/p&gt;

&lt;h3&gt;
  
  
  Change Detection and Diff Review
&lt;/h3&gt;

&lt;p&gt;When the daily build detects changes in table structure or AI descriptions, they're recorded as "pending changes." Reviewers can view before/after diffs in the web UI and mark them as reviewed.&lt;/p&gt;

&lt;p&gt;This ensures nothing slips through — if yesterday's build changed something, someone will see it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review Data Persistence
&lt;/h3&gt;

&lt;p&gt;Review data is stored in Firestore and &lt;strong&gt;never overwritten by daily builds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The daily build follows this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ORM parsing → graph construction&lt;/strong&gt; — Re-extract table definitions from latest code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BQ MERGE&lt;/strong&gt; — Merge while preserving human-written &lt;code&gt;textForEmbedding&lt;/code&gt; and &lt;code&gt;embedding&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-apply Firestore reviews&lt;/strong&gt; — Write &lt;code&gt;humanDescription&lt;/code&gt;, &lt;code&gt;isDead&lt;/code&gt;, &lt;code&gt;deadNote&lt;/code&gt;, &lt;code&gt;checkedAt&lt;/code&gt; back to BQ properties&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Reviews survive unlimited daily rebuild cycles.&lt;/strong&gt; Firestore is the source of truth; BQ is its reflection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crossing the VPC Wall: Cross-Cloud Architecture
&lt;/h2&gt;

&lt;p&gt;Now for the security design I'm most proud of.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The MCP server runs on Google Cloud (Cloud Run). The databases are inside AWS VPCs. Cloud Run can't directly reach VPC-internal RDS/DocumentDB instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; A three-stage authentication chain — GCP OIDC → AWS STS → VPC Lambda — enables secure cross-cloud connectivity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmow2vgd95105kzmrxglc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmow2vgd95105kzmrxglc.png" alt="Query Dataflow" width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Cloud Run (GCP) → Get OIDC token from GCP metadata server
2. OIDC token → AWS STS AssumeRoleWithWebIdentity
3. STS → Return temporary AWS credentials (1-hour TTL)
4. Temporary credentials → Invoke VPC-internal Lambda
5. Lambda → Execute query against VPC-internal RDS/DocumentDB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero static AWS credentials.&lt;/strong&gt; Dynamically obtained from GCP service account.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporary credentials cached for 5 minutes.&lt;/strong&gt; Avoids per-request STS overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda executes inside VPC.&lt;/strong&gt; DB connections never leave the VPC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production queries use Read Replicas only.&lt;/strong&gt; Never connects to the master.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SQL Validation (Defense in Depth)
&lt;/h3&gt;

&lt;p&gt;Query safety is enforced at two layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP layer (1st):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Allowed: SELECT, SHOW, DESCRIBE, DESC, EXPLAIN, WITH...SELECT
Blocked: INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, TRUNCATE, multi-statement via semicolons
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lambda layer (2nd):&lt;/strong&gt;&lt;br&gt;
The same validation runs inside Lambda. Even if the MCP layer is somehow bypassed, Lambda blocks it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Protecting Production Data — PII Anonymization
&lt;/h2&gt;

&lt;p&gt;Querying production data is powerful, but handling personally identifiable information (PII) requires the most care.&lt;/p&gt;
&lt;h3&gt;
  
  
  Automatic Anonymization Rules
&lt;/h3&gt;

&lt;p&gt;For production + view permission queries, PII column values are &lt;strong&gt;automatically anonymized&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column Pattern&lt;/th&gt;
&lt;th&gt;Replacement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Email fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;***@***.com&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Name fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;***&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phone fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;***-****-****&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postal code fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;***-****&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;***&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Password fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[REDACTED]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Date of birth fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;****-**-**&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Card number fields&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[REDACTED]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Table-specific rules handle ambiguous columns. For example, a generic &lt;code&gt;name&lt;/code&gt; column isn't PII globally, but &lt;code&gt;users.name&lt;/code&gt; or &lt;code&gt;orders.buyer_name&lt;/code&gt; clearly is. These are configured per-table.&lt;/p&gt;
&lt;h3&gt;
  
  
  Staging vs Production
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;PII Anonymization&lt;/th&gt;
&lt;th&gt;Connection Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Staging&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Master DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production (view)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Auto-applied&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read Replica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production (edit)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Read Replica&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Staging uses test data, so no anonymization needed. Only production view queries get automatic PII protection.&lt;/p&gt;
&lt;h2&gt;
  
  
  Fully Automated Access Management — DB Account Pipeline
&lt;/h2&gt;

&lt;p&gt;"Who do I talk to about getting database access?"&lt;/p&gt;

&lt;p&gt;This question doesn't get asked anymore. The DB Account Pipeline automates everything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkk5uj51hmtbddp2jofw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkk5uj51hmtbddp2jofw.png" alt="Credential Flow" width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User submits a workflow request&lt;/strong&gt; — nickname, email, desired databases (multiple allowed)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Manager approves&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Run Job processes automatically&lt;/strong&gt; — reads approved requests, generates CREATE USER statements per DB, executes via Lambda&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credentials saved to Firestore + Secret Manager&lt;/strong&gt; — passwords never stored in plaintext&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack DM with connection info&lt;/strong&gt; — includes bastion server guide&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Zero Plaintext Passwords
&lt;/h3&gt;

&lt;p&gt;Passwords are stored &lt;strong&gt;only in Secret Manager&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Firestore db_credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xxx.rds.amazonaws.com"&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;
  &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ryan_view_user"&lt;/span&gt;
  &lt;span class="na"&gt;passwordSecretId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db-cred-xxxxx"&lt;/span&gt;  &lt;span class="s"&gt;← Reference to Secret Manager only&lt;/span&gt;
  &lt;span class="na"&gt;permLevel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;view"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the MCP Server executes a query, it decrypts the password from Secret Manager via &lt;code&gt;passwordSecretId&lt;/code&gt; and caches it in memory for 5 minutes. Cloud Run restarts clear the cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No plaintext password exists anywhere&lt;/strong&gt; — this was a deliberate design decision we're particularly proud of.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Daily Cron
&lt;/h3&gt;

&lt;p&gt;A cron job fires at 6:00 AM JST daily, triggering a Cloud Run Job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;6:00 AM JST — Cron fires
├── ORM parsing (28 repos × 4 ORMs)
├── Live DB validation (11 staging DBs)
├── Gemini description generation (incremental only)
├── Graph construction + Embedding
├── BQ MERGE (preserving annotations)
└── Slack notification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cost
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash (daily, incremental)&lt;/td&gt;
&lt;td&gt;~$0.10-0.20/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vertex AI Embedding&lt;/td&gt;
&lt;td&gt;~$0.01/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Run Job&lt;/td&gt;
&lt;td&gt;Near-free (once daily)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BQ Storage&lt;/td&gt;
&lt;td&gt;A few GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda&lt;/td&gt;
&lt;td&gt;Shared with DB Account Pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Thanks to incremental detection, we maintain an AI-powered dictionary for 991 tables at &lt;strong&gt;under $10/month&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incremental Detection
&lt;/h3&gt;

&lt;p&gt;Regenerating all table descriptions daily would spike Gemini costs. So we introduced &lt;strong&gt;change detection&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Compare previous property hashes
2. Detect column structure changes (additions/removals/type changes)
3. Identify affected tables via enum dependency graph
→ Regenerate only changed tables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a status enum changes, all tables using that enum are regenerated. No changes? Skip. This cuts AI costs by roughly 90%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Protection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OAuth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Account + corporate domain restriction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Credential Resolution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;email → nickname → per-user DB credentials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Permission Filter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-user × database × environment × permission level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL Validation (MCP)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SELECT-only enforcement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL Validation (Lambda)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same validation (defense in depth)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PII Anonymization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Production + view queries only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Production Connection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read Replicas only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Passwords&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secret Manager only, 5-min TTL memory cache&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-Cloud Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GCP OIDC → AWS STS (zero static credentials)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Passwords and query results never logged&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;DB Graph MCP goes beyond solving the fundamental database problem of "you can't use what you don't know exists." It &lt;strong&gt;enables anyone to search real data without knowing SQL at all&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;As a dictionary&lt;/strong&gt; — Search 991 tables' structure, relationships, and enum definitions in natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;As a query tool&lt;/strong&gt; — Securely query staging and production data with automatic PII protection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;As a knowledge base&lt;/strong&gt; — DEAD flags and column annotations surface 10 years of tacit knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest lesson from building this: &lt;strong&gt;the real value of MCP is giving AI context&lt;/strong&gt;. Table structure, relationships, enum definitions, column warnings — when these enter AI's context window, the SQL and code Claude Code writes become dramatically more accurate.&lt;/p&gt;

&lt;p&gt;Making that happen required building the graph, securing cross-cloud access, automating permission management, and protecting PII — unglamorous but essential infrastructure, built with care.&lt;/p&gt;

&lt;p&gt;I hope this helps anyone wrestling with internal database management at scale.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>security</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
