<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Richard Baxter</title>
    <description>The latest articles on Forem by Richard Baxter (@richardbaxter).</description>
    <link>https://forem.com/richardbaxter</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3760451%2Fc496ed81-9047-4edc-b07b-2715c9407d05.jpeg</url>
      <title>Forem: Richard Baxter</title>
      <link>https://forem.com/richardbaxter</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/richardbaxter"/>
    <language>en</language>
    <item>
      <title>Implementing WebMCP on a Recruitment Website</title>
      <dc:creator>Richard Baxter</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:57:29 +0000</pubDate>
      <link>https://forem.com/richardbaxter/implementing-webmcp-on-a-recruitment-website-19a</link>
      <guid>https://forem.com/richardbaxter/implementing-webmcp-on-a-recruitment-website-19a</guid>
      <description>&lt;p&gt;&lt;strong&gt;Thinking about what, exactly, the future of a website “looks” like in the agentic era is a challenging proposition. It might be that in most cases, our future viewers/readers/customers can do everything, from their chatbot of preference while never visiting your site.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebMCP is a part of this puzzle. The protocol directs an agent to stop guessing what a button does, and starts calling tools with typed inputs with a fundamentally simple tool registration protocol.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, what’s WebMCP? WebMCP is “an emerging W3C standard developed by Google and Microsoft that acts as a browser API to turn websites into interactive tools for AI agents”.&lt;/p&gt;

&lt;p&gt;This is the future and I think it’s very exciting. For fun, I added WebMCP support to &lt;a href="https://yubhub.co" rel="noopener noreferrer"&gt;&lt;strong&gt;YubHub&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;,&lt;/strong&gt; (my recruitment site).&lt;/p&gt;

&lt;p&gt;Today, I’m going to talk about what I learned building, the design choices we made that worked, an agent trace that caught a hallucination in a production system (fixed), and why (in my opinion!) a recruitment site turned out to be an unusually good fit for this protocol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4hhj8cqvfgqxveqnbed.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4hhj8cqvfgqxveqnbed.png" alt="YubHub - our jobs site ready for an AI future" width="800" height="423"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;YubHub in Chrome 146 with the Model Context Tool Inspector open.&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why a Recruitment Site?
&lt;/h2&gt;

&lt;p&gt;Because websites new and old are going to have to change rather quickly (and, I really enjoy starting online business from scratch – the test of a true martech marketer!).&lt;/p&gt;

&lt;p&gt;Most of the WebMCP explainers I read while researching this use a checkout flow, a to-do list, or a colour picker. Toys. They work for illustrating the API surface (what can this new thing do?), but without implmenting and playing with teh protocol yourself I fear the penny simply does not drop.&lt;/p&gt;

&lt;p&gt;The real value with WebMCP emerges when an agent needs to reason about a lot of structured data that’s already behind a nice URL structure – and recruitment sites are built around exactly that shape: job details, salary ranges, skills required or employment locations.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/srC-1pEXdmY"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;YubHub already served clean data at predictable URLs (&lt;code&gt;/jobs/skill/figma&lt;/code&gt;, &lt;code&gt;/jobs/at/anthropic&lt;/code&gt;) and emits schema.org JobPosting markup. What it didn’t have was a “contract”. Any agent browsing the site could see the HTML and JSON-LD I’m sure, but getting an agent to browse your website for an answer via Chrome is just the craziest waste of time when ideas as fundamental as an API have been around for decades.&lt;/p&gt;

&lt;p&gt;WebMCP fixes that by publishing the interface explicitly, with typed inputs and structured responses. Prompt in &amp;gt; “find me an X for a Y in Z”. So, for a site whose whole job is “be discoverable”, WebMCP is a big deal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6uq5n234qmpek3v1g2h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6uq5n234qmpek3v1g2h.jpg" alt="Architecture diagram showing a YubHub page exposing tools through navigator.modelContext to a browser AI agent, with a response flowing back" width="800" height="447"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The basic flow. The page registers tools on &lt;code&gt;navigator.modelContext&lt;/code&gt;, the agent discovers and invokes them, the page replies with structured data. No screen-scraping in between.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What’s WebMCP?
&lt;/h2&gt;

&lt;p&gt;WebMCP is a proposed web standard, co-authored by engineers at Google and Microsoft under the &lt;a href="https://www.w3.org/groups/cg/webmachinelearning/" rel="noopener noreferrer"&gt;W3C Web Machine Learning community group&lt;/a&gt;, that lets a site expose a set of callable tools to an AI agent running in the browser. There are two APIs: imperative (&lt;code&gt;navigator.modelContext.registerTool()&lt;/code&gt; called from JavaScript), and declarative.&lt;/p&gt;

&lt;p&gt;Think of it as a contract the site publishes to any agent that lands on it. Instead of the agent guessing that a div with &lt;code&gt;class="btn-primary"&lt;/code&gt; means “checkout”, the page says: here’s a &lt;code&gt;checkout&lt;/code&gt; tool, here’s what it needs, here’s what you’ll get back.&lt;/p&gt;

&lt;p&gt;That’s a welcome leap from screen-scraping and MCP based browser control. The Chrome DevTools MCP quickstart benchmarked a simple “set counter to 42” task at 3,801 tokens using screenshots and 433 tokens using WebMCP. That’s an 89% reduction in token use!&lt;/p&gt;

&lt;p&gt;The current spec lives on the W3C community group site (&lt;a href="https://webmachinelearning.github.io/webmcp" rel="noopener noreferrer"&gt;webmachinelearning.github.io/webmcp&lt;/a&gt;), with Chrome’s early-preview announcement and detailed posts on &lt;a href="https://developer.chrome.com/blog/webmcp-epp" rel="noopener noreferrer"&gt;developer.chrome.com/blog/webmcp-epp&lt;/a&gt;. It ships behind a flag in Chrome Canary 146. Google Chrome Labs maintain a reference extension – the Model Context Tool Inspector – that lists the tools any page has registered and lets you call them manually. If you’re planning to build anything with this, you’ll want both installed.&lt;/p&gt;
&lt;h2&gt;
  
  
  How WebMCP Differs From MCP
&lt;/h2&gt;

&lt;p&gt;The names don’t help much, do they! Model Context Protocol (MCP) is &lt;strong&gt;server-side&lt;/strong&gt; – you deploy an MCP server that an agent connects to, it runs in its own process, exposes tools to your AI assistant over JSON-RPC. We have a growing library of our own MCP connectors like my favourites &lt;a href="https://github.com/houtini-ai/gemini-mcp" rel="noopener noreferrer"&gt;Gemini MCP&lt;/a&gt; and &lt;a href="https://github.com/houtini-ai/lm" rel="noopener noreferrer"&gt;Houtini-LM&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebMCP is the browser-side sibling&lt;/strong&gt;. Your tool code runs in the page’s JavaScript context, so it has the user’s cookies, their session, their permissions – everything the user already has access to. There’s no deployment, no auth bridge, no server to pay for. Well, mostly – you still have to maintain the page, obviously, but there’s nothing extra. If your site already authenticates the user, your WebMCP tools inherit that authentication for free.&lt;/p&gt;

&lt;p&gt;The practical consequence: MCP is the right fit for agents that need third-party data (GitHub, Gmail, a database). WebMCP is the right fit for agents that need to interact with a specific site the user has requested.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Set Up
&lt;/h2&gt;

&lt;p&gt;If you have Chrome Canary 146 or higher, you are already setup. The stable, Beta, and Dev channels of Chrome do not ship with the WebMCP flag – so you have to enable it:&lt;/p&gt;

&lt;p&gt;Open a new tab to &lt;code&gt;chrome://flags/#enable-webmcp-testing&lt;/code&gt; and set “WebMCP for testing” to Enabled. Click Relaunch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcetju0xu174w8mbp1hft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcetju0xu174w8mbp1hft.png" alt="Chrome flags page with the WebMCP for testing flag highlighted and set to Enabled" width="800" height="326"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Enable WebMCP for testing&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Third, install the &lt;a href="https://chromewebstore.google.com/detail/webmcp-model-context-tool/gbpdfapgefenggkahomfgkhfehlcenpd" rel="noopener noreferrer"&gt;Model Context Tool Inspector extension&lt;/a&gt; from the Chrome Web Store. This is currently the only way to test anything before you’ve wired an agent in. It lists every tool the current page has registered, shows you their JSON schemas, and lets you execute them manually with whatever input you want. The extension requires a Gemini API key as the extension uses Gemini as a natural-language test harness, which is useful for checking whether your tool descriptions are clear enough for a real LLM to pick the right tool. This is much like tool context set in tool descriptions in the MCP protocol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmin2qu97qqxfi5681c2u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmin2qu97qqxfi5681c2u.png" alt="Chrome Web Store listing for the WebMCP Model Context Tool Inspector extension" width="800" height="597"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The WebMCP Inspector extension.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To verify the whole thing is working, open DevTools and run &lt;code&gt;console.log(navigator.modelContext)&lt;/code&gt;. If you see an object come back rather than &lt;code&gt;undefined&lt;/code&gt;, you’re in business. If it’s &lt;code&gt;undefined&lt;/code&gt;, either your flag isn’t set or you’re on the wrong Chrome channel.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting Up the Imperative API
&lt;/h2&gt;

&lt;p&gt;Here’s the minimum viable tool registration from YubHub’s live scaffold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;pageController&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pagehide&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;pageController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;once&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;yubhub_fetch_job&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fetch a single YubHub job as clean markdown (company, location, salary, work arrangement, full description, apply URL, skills). More token-efficient than the HTML page.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;^job_[a-z0-9_-]+$&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/jobs/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Accept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text/markdown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pageController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are 4 things to look at here – firstly, the feature-detection guard – &lt;code&gt;navigator.modelContext&lt;/code&gt; only exists in Chrome 146 with the flag on, so every public-facing scaffold needs to short-circuit gracefully elsewhere or you’ll throw on every other browser your users visit from.&lt;/p&gt;

&lt;p&gt;The pattern constraint on &lt;code&gt;id&lt;/code&gt; – the spec treats JSON Schema loosely in the browser, so validating in code catches the cases the schema doesn’t. And the return value is a structured object, not a string – the agent reads it as data rather than trying to parse text/html.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53rxdkffyounh5q94fkp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53rxdkffyounh5q94fkp.png" alt="Model Context Tool Inspector panel showing the tool definition highlighted in a green box and the agent prompt highlighted in a red box" width="800" height="531"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Inspector showing the tool’s schema (green box) and the agent’s prompt (red box). You can drive this entirely by hand during development, which is the fastest way to catch bad tool descriptions.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Declarative API Is Free Engineering
&lt;/h2&gt;

&lt;p&gt;I think this is the cleverest part of the WebMCP spec. If you’ve got an HTML form – and of course, you do, you can annotate it and get a tool registration for free.&lt;/p&gt;

&lt;p&gt;YubHub’s nav already had a GET form submitting to &lt;code&gt;/jobs/search&lt;/code&gt;. All I did was add three attributes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;form&lt;/span&gt; &lt;span class="na"&gt;action=&lt;/span&gt;&lt;span class="s"&gt;"/jobs/search"&lt;/span&gt; &lt;span class="na"&gt;method=&lt;/span&gt;&lt;span class="s"&gt;"GET"&lt;/span&gt;
      &lt;span class="na"&gt;toolname=&lt;/span&gt;&lt;span class="s"&gt;"yubhub_search"&lt;/span&gt;
      &lt;span class="na"&gt;tooldescription=&lt;/span&gt;&lt;span class="s"&gt;"Search YubHub's enriched job board by free-text query - title, company, or keyword. Submits a GET to /jobs/search and renders the results page."&lt;/span&gt;
      &lt;span class="na"&gt;toolautosubmit&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"search"&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"q"&lt;/span&gt;
         &lt;span class="na"&gt;toolparamdescription=&lt;/span&gt;&lt;span class="s"&gt;"Job title, company name, or any keyword (2-100 chars)"&lt;/span&gt;
         &lt;span class="na"&gt;minlength=&lt;/span&gt;&lt;span class="s"&gt;"2"&lt;/span&gt; &lt;span class="na"&gt;maxlength=&lt;/span&gt;&lt;span class="s"&gt;"100"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/form&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The browser reads the form, builds a JSON Schema from the input fields, and registers it as a tool. When an agent calls it with &lt;code&gt;{q: "python"}&lt;/code&gt;, the browser focuses the form, populates the input, and – because of &lt;code&gt;toolautosubmit&lt;/code&gt; – submits it automatically. Zero JavaScript. Your form already works for humans; now it works for agents.&lt;/p&gt;

&lt;p&gt;One thing worth flagging: &lt;code&gt;toolautosubmit&lt;/code&gt; is only really safe on read-only operations. Search queries, availability lookups, status checks – fine, let the agent submit. Anything that creates, modifies, or deletes data should leave the flag off, so the human has to click Submit after the agent fills the form.&lt;/p&gt;

&lt;p&gt;And that, I reckon, is how WebMCP adoption will probably start to emerge – by adding it to their internal site search forms and results pages. But I hope to see people agreeing that this idea undersells what WebMCP can do in the longer term.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context-Scoped Tool Registration
&lt;/h2&gt;

&lt;p&gt;I’ve got a tool called &lt;code&gt;yubhub_current_job&lt;/code&gt;. It takes no arguments, and it just returns the current job’s markdown. Obviously, that only makes sense on my job listing pages, so anywhere else on the site it would be meaningless (worse, it’d be confusing). The scaffold pattern-matches the URL and registers conditionally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jobMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;jobs&lt;/span&gt;&lt;span class="se"&gt;\/(&lt;/span&gt;&lt;span class="sr"&gt;job_&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;a-z0-9_-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;$/i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobMatch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentJobId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jobMatch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;yubhub_current_job&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Return clean markdown for the job the user is currently viewing (&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;currentJobId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;). Zero-arg shortcut.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/jobs/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;currentJobId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentJobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pageController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AbortController ties it to the page lifecycle – when the user navigates away, &lt;code&gt;pagehide&lt;/code&gt; fires, the controller aborts, and all tools registered with its signal drop off the agent’s radar. Importantly this leaves no leftover closures holding references to the previous page’s state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfy55b9ehyssqbyz03ve.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfy55b9ehyssqbyz03ve.jpg" alt="Diagram showing three YubHub pages (homepage, job detail, facet archive) with different sets of registered WebMCP tools on each" width="800" height="597"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Same site, three pages, three different tool menus. The agent sees only the tools that make contextual sense.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Navigation Tools vs Data-Retrieval Tools
&lt;/h2&gt;

&lt;p&gt;There are two kinds of tools you can give an agent, and they create completely different user experiences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Navigation tools&lt;/strong&gt; change the URL on the user’s behalf. They’re the right call when the user actually wants to end up somewhere else – looking at a specific job, browsing a facet, running a search. So the agent calls &lt;code&gt;yubhub_browse_jobs({type: 'skill', slug: 'figma'})&lt;/code&gt;, the tab navigates, the user sees the archive page. The return value is basically a receipt (“I took you here, here’s what’s on the page”).&lt;/p&gt;

&lt;p&gt;The real power, IMO, is this: &lt;strong&gt;Data-retrieval tools&lt;/strong&gt; return parsed data and leave the URL alone. The agent calls them to reason, compare, filter, summarise – and the user stays on whatever page they were already on, quite unaware that the agent just read a hundred jobs to answer a question. &lt;code&gt;yubhub_fetch_facet&lt;/code&gt; returns up to 100 schema.org JobPosting entries; &lt;code&gt;yubhub_shortlist&lt;/code&gt; filters them by salary floor, remote status, title keyword, and hands back a ranked list.&lt;/p&gt;

&lt;p&gt;Data-retrieval is what I’m bullish on – it’s a midbendingly powerful solution to the current state of play with AI based webcrawl / actions for users and so on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qlv54tspmxfsb5ms9ad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qlv54tspmxfsb5ms9ad.png" alt="Inspector panel showing a ranked list of product manager job results returned by a WebMCP data-retrieval tool" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;code&gt;yubhub_shortlist&lt;/code&gt; returning a ranked list of product manager roles.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Only shipping navigation tools is the mistake I’d warn you off. It just makes the agent a fancy URL-rewriter – it can answer “take me to X”, but it can’t answer “summarise the options and recommend one” without hauling the whole user along.&lt;/p&gt;

&lt;h2&gt;
  
  
  What WebMCP Can’t Do (Yet)
&lt;/h2&gt;

&lt;p&gt;A browsing context is required. There’s no headless mode – if a tab isn’t open, the tools don’t exist. That rules out batch or scheduled agentic work against a WebMCP site, for now.&lt;/p&gt;

&lt;p&gt;Discoverability is thin. An agent has to visit your page to see your tools. There’s no global registry, no “hey agent, here’s a list of sites with useful WebMCP tools” index. A &lt;code&gt;.well-known/webmcp&lt;/code&gt; manifest has been proposed – and I think it’ll be necessary eventually – but nothing’s shipped yet. Someone is going to come up with an authority directory of WebMCP supportive brands any minute now.&lt;/p&gt;

&lt;p&gt;The spec moves weekly. Between my first discovery of the docs and shipping, my implementation, &lt;code&gt;provideContext&lt;/code&gt; and &lt;code&gt;clearContext&lt;/code&gt; got removed entirely, &lt;code&gt;unregisterTool&lt;/code&gt; came back after being briefly deprecated, and the &lt;code&gt;AbortSignal&lt;/code&gt; handling in &lt;code&gt;registerTool&lt;/code&gt;‘s second argument settled into the form above. Track the spec repo and expect to rewrite things. I’m in the preview group which is very easy to register for.&lt;/p&gt;

&lt;p&gt;If you run a site that has structured data behind predictable URLs – a product catalogue, a job board, a docs site, a media library – WebMCP is worth shipping now, even though browser adoption is near zero. It’s cheap to add, it forces you to think about your tool contracts before agents try to guess them, and it positions the site for when Chrome lifts the flag and Edge follows. When that happens I think there will be quite a fuss about it.&lt;/p&gt;




&lt;p&gt;The post &lt;a href="https://houtini.com/webmcp-on-a-recruitment-website/" rel="noopener noreferrer"&gt;Implementing WebMCP on a Recruitment Website&lt;/a&gt; appeared first on &lt;a href="https://houtini.com" rel="noopener noreferrer"&gt;Houtini&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webmcp</category>
      <category>mcp</category>
      <category>webdev</category>
      <category>schema</category>
    </item>
    <item>
      <title>Making a Local LLM MCP Server Deterministic: Model Routing, Think-Block Stripping, and the Problems Nobody Warns You About</title>
      <dc:creator>Richard Baxter</dc:creator>
      <pubDate>Tue, 17 Mar 2026 12:48:05 +0000</pubDate>
      <link>https://forem.com/richardbaxter/making-a-local-llm-mcp-server-deterministic-model-routing-think-block-stripping-and-the-problems-5bmj</link>
      <guid>https://forem.com/richardbaxter/making-a-local-llm-mcp-server-deterministic-model-routing-think-block-stripping-and-the-problems-5bmj</guid>
      <description>&lt;p&gt;For some time, I've been experimenting with the idea that by using an MCP server, we can delegate bounded tasks from Claude Code to cheaper local or cloud models (models I run on a local server in LM Studio). It makes sense, why chew through long, repetitive regression testing tasks when this could be directed by claude, but executed by a simpler, arguably more efficient for the task model instead?&lt;/p&gt;

&lt;p&gt;The other worry I have - what if Anthropic added a few zeros to their subscription and half of us had to rethink how we use the flagship models? This is my ongoing experiment. There's no "this is how you have to work from now on" pressure that I feel everytime I read about a new release, I'm just curious to see if we can get to a point where Claude is orchestrating and delegating to whatever local model(s) you have available for the same of token efficiency. It might matter one day!&lt;/p&gt;

&lt;p&gt;My v1 was simple - running one model, on one endpoint, instructing Claude to think about handover for specific tasks. After not very long I'd rewritten most of it - as it turns out Claude doesn't want to share much work.&lt;/p&gt;

&lt;p&gt;Today, I'm going to give you a tour of my work so far (this is experimental project; I welcome honest feedback, forks and pull requests). The post-mortem on what broke, what wasn't cutting the mustard and how that influenced where I've got to: model routing, think-block stripping, SQLite model caching, and per-model prompt tuning. All TypeScript, all open source.&lt;/p&gt;

&lt;p&gt;I wrote about &lt;a href="https://dev.to/richardbaxter/sqlite-as-an-mcp-context-saver-stop-cramming-raw-api-data-into-your-llm-2oj4"&gt;using SQLite as a context saver in MCP servers&lt;/a&gt; a couple of weeks ago and the core argument there was: don't cram raw API data into your LLM's context window. It doesn't scale. As the dataset grows, the token cost balloons, the signal-to-noise ratio collapses, and the model starts forgetting becuase of incomplete, compacted data. "Memory" - just stuffing everything in context or a SQLlite dependency and hoping the model sorts it out - is not architecture. It's a different flavour of context bloat through tool descriptions and database entries of what the model did.&lt;/p&gt;

&lt;p&gt;This is the same problem showing up in a completely different place. When your MCP server needs to know what twelve different local models are good at - their strengths, weaknesses, best task types, context lengths, quantisation levels - you can either dump all of that into every conversation, or you can cache it locally and query what you need. One approach costs hundreds of tokens per call and gets worse as you add models. The other costs just much less.&lt;/p&gt;

&lt;p&gt;The MCP server is &lt;a href="https://github.com/houtini-ai/lm" rel="noopener noreferrer"&gt;houtini-lm&lt;/a&gt;. It sits between Claude Code and whatever OpenAI-compatible endpoint you've got running - LM Studio, Ollama, vLLM, cloud APIs, whatever speaks &lt;code&gt;/v1/chat/completions&lt;/code&gt;. Claude keeps on top of the reasoning. The cheap(er) model handles the outputs.&lt;/p&gt;

&lt;p&gt;The hurdles.. Some of which I haven't quite overcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  The routing problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvku6p3d2kf7gwqzgbc22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvku6p3d2kf7gwqzgbc22.png" alt="Comparison diagram showing v1 with one model versus v2.7 with task-based routing" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;v1 assumed you had one model loaded. You'd set &lt;code&gt;LM_STUDIO_URL&lt;/code&gt;, maybe override &lt;code&gt;LM_STUDIO_MODEL&lt;/code&gt;, and every delegation call went to the same place. Fine if you're running Qwen Coder and only delegating code tasks.&lt;/p&gt;

&lt;p&gt;Then I loaded GLM-4 alongside Qwen Coder because I wanted a general-purpose model for chat-style delegation - code explanations, content rewrites, commit messages. And immediately hit the problem: houtini-lm had no concept of "this is a code task, use the coder model" versus "this is a chat task, use the general model." Everything went to whatever model ID was in the config.&lt;/p&gt;

&lt;p&gt;So I wrote a router. Here's the core of &lt;code&gt;routeToModel&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;TaskType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;embedding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;routeToModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TaskType&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RoutingDecision&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;listModelsRaw&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;loaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;bestModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;bestScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getPromptHints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bestTaskTypes&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Bonus: code-specialised models for code tasks&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getModelProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;family&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;coder&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Bonus: larger context for analysis tasks&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getContextLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;bestScore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;bestScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;bestModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bestModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getPromptHints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bestModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things worth noting about this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. It queries the LM Studio &lt;code&gt;/v1/models&lt;/code&gt; endpoint at routing time.&lt;/strong&gt;  This sounds expensive but the endpoint returns in under 5ms locally and it means model hot-swaps in LM Studio are picked up immediately (even if they can take their good time to load...). I tried caching this and it caused more problems than it solved - we don't want stale model lists when you unload something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. It can't JIT-load models.&lt;/strong&gt; The MCP SDK has a hard ~60-second timeout on tool calls. Loading a model in LM Studio takes minutes. So if the best model for a task isn't loaded, the router uses the best available one and returns a suggestion string: "💡 qwen3-coder-next is downloaded and better suited for code tasks - ask the user to load it in LM Studio." Claude surfaces this to the user. Not ideal, but the alternative (silent timeout) is worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The scoring is deliberately simple.&lt;/strong&gt; Current version: does the model's &lt;code&gt;bestTaskTypes&lt;/code&gt; include this task type? 10 points. Is it a coder model and this is a code task? 5 bonus. Large context and analysis task? 2 bonus. The highest score wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Think-block stripping
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx80sy7qnj2a6u0ubc60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx80sy7qnj2a6u0ubc60.png" alt="Diagram showing think-block stripping - raw model output with think tags being cleaned to just the useful response" width="768" height="1376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GLM-4, Qwen3, Nemotron - these models always emit internal chain-of-thought reasoning wrapped in &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags before producing their actual response. When I first loaded GLM-4, every delegation call came back with 400+ tokens of the model arguing with itself before the 50-token answer I actually needed. Watching GLM have a discussion with itself tells you a lot about the model itself - it doesn't seem very confident and really seems to question itself.&lt;/p&gt;

&lt;p&gt;The fix is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Strip &amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt; reasoning blocks&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cleanContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;think&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;think&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// closed blocks&lt;/span&gt;
&lt;span class="nx"&gt;cleanContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleanContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&amp;lt;think&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                   &lt;span class="c1"&gt;// orphaned opening tag&lt;/span&gt;
&lt;span class="nx"&gt;cleanContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleanContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's 2 lines of regex. But that second line took me a bit long to pin down. Sometimes the model runs out of generation tokens mid-think-block. You get &lt;code&gt;&amp;lt;think&amp;gt;The user wants test stubs for...&lt;/code&gt; and then the actual output, with no closing &lt;code&gt;&amp;lt;/think&amp;gt;&lt;/code&gt;. The first regex doesn't match because there's no closing tag. So I was getting leaked reasoning mixed into the response which cause obvious problems.&lt;/p&gt;

&lt;p&gt;My orphaned tag regex catches that - it's not elegant, but it works, and it was a hard won breakrthough.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;emitsThinkBlocks&lt;/code&gt; flag in the prompt hints system means this only runs for models that produce think blocks. There's no unnecessary processing for LLaMA or other instruct models that don't use this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLite model cache (sql.js WASM, again)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xjucdh38dcswtuj0vj2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xjucdh38dcswtuj0vj2.png" alt="Architecture diagram showing how the SQLite model cache works - checking static profiles first, falling back to HuggingFace API, caching in sql.js" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where the argument from my &lt;a href="https://dev.to/richardbaxter/sqlite-as-an-mcp-context-saver-stop-cramming-raw-api-data-into-your-llm-2oj4"&gt;previous SQLite post&lt;/a&gt; loads back into reader context... The router needs to know what each model is good at. You could stuff model profiles into the system prompt - strengths, weaknesses, best task types for every loaded model. With two models that's maybe 300 tokens. With twelve it's 2,000. And it's the same 2,000 tokens on every single tool call, burning context on metadata the model has already seen. That's the "memory as architecture" trap again: it works at small scale and falls apart the moment your data grows.&lt;/p&gt;

&lt;p&gt;So I did what I did with the Search Console data example: cache it locally, query what you need, return only what's relevant to this specific routing decision.&lt;/p&gt;

&lt;p&gt;For Qwen Coder or GLM-4, I've got hand-written (copy and pasted) profiles - curated descriptions, strengths, weaknesses, what best task types suit the model. But what about when someone loads a random GGUF they downloaded from HuggingFace? Let's query that and store it to the db.&lt;/p&gt;

&lt;p&gt;The cache works in two tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Static profiles&lt;/strong&gt; - regex-matched against the model ID or architecture field. I maintain these by hand for model families I've used recently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MODEL_PROFILES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;RegExp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelProfile&lt;/span&gt; &lt;span class="p"&gt;}[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/qwen3-coder|qwen3.*coder/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;family&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Qwen3 Coder&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Code-specialised model with agentic capabilities.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;strengths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code generation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code review&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;debugging&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test writing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;weaknesses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;non-code creative tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;bestFor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code generation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code review&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test stubs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;refactoring&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 12 more families&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tier 2: SQLite cache with HuggingFace auto-profiling&lt;/strong&gt; - if no static profile matches, the server queries HuggingFace's free API, parses the model card, and generates a profile. This gets cached in a SQLite database with a 7-day TTL.&lt;/p&gt;

&lt;p&gt;I chose &lt;code&gt;sql.js&lt;/code&gt; (pure WASM) instead of &lt;code&gt;better-sqlite3&lt;/code&gt;. For Better Search Console I used &lt;code&gt;better-sqlite3&lt;/code&gt; because it was the data layer - hundreds of thousands of rows, complex queries, WAL mode, the lot. For houtini-lm, the cache holds maybe 20 rows. The priority is zero native dependencies. &lt;code&gt;sql.js&lt;/code&gt; compiles to WASM, which means &lt;code&gt;npx -y @houtini/lm&lt;/code&gt; works on any machine without needing a C++ toolchain. No &lt;code&gt;node-gyp&lt;/code&gt;and no build failures on Windows. I wince at whether this would work without a fix on MAC becuase I don't own one and perhaps never will. Still, I've had &lt;code&gt;better-sqlite3&lt;/code&gt; fail on three separate machines because of &lt;code&gt;node-gyp&lt;/code&gt; version mismatches - none of this is worth such a small 20-100 row resource.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sql.js&lt;/code&gt; is slower for heavy workloads. For a 20-row lookup table, though, the speed difference is not noticeable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Schema&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
  CREATE TABLE IF NOT EXISTS model_profiles (
    model_id TEXT PRIMARY KEY,
    hf_id TEXT,
    pipeline_tag TEXT,
    architectures TEXT,
    license TEXT,
    downloads INTEGER,
    likes INTEGER,
    library_name TEXT,
    family TEXT,
    description TEXT,
    strengths TEXT,
    weaknesses TEXT,
    best_for TEXT,
    fetched_at INTEGER,
    source TEXT DEFAULT 'huggingface'
  )
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;fetched_at&lt;/code&gt; timestamp drives the 7-day TTL. After a week, the cache re-fetches from HuggingFace in case the model card has been updated. In practice this almost never matters, but I've had at least one case where a model's pipeline tag changed after a major update and the stale cache was routing it incorrectly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Per-model prompt hints
&lt;/h2&gt;

&lt;p&gt;This is the bit that makes the difference to output quality.&lt;/p&gt;

&lt;p&gt;Each model family has its own set of annoying quirks. GLM-4 writes a paragraph of introduction before every response unless you tell it not to. Even if you do, there's a chance it'll ignore you - Llama is incredibly hard to get only-what-you-want in the output. Qwen Coder is best at temperature 0.1 for code but that's far too low for chat tasks. Nemotron handles structured output well but needs explicit "no preamble" instructions.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;PromptHints&lt;/code&gt; interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;PromptHints&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;codeTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;// temperature for code generation&lt;/span&gt;
  &lt;span class="nl"&gt;chatTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;// temperature for chat/analysis&lt;/span&gt;
  &lt;span class="nl"&gt;outputConstraint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// injected into system prompt&lt;/span&gt;
  &lt;span class="nl"&gt;emitsThinkBlocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// flag for think-block stripping&lt;/span&gt;
  &lt;span class="nl"&gt;bestTaskTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;embedding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the per-model configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// GLM-4: great general model, but verbose without constraints&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/glm&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;- &lt;/span&gt;&lt;span class="se"&gt;]?&lt;/span&gt;&lt;span class="sr"&gt;4/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;codeTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;chatTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;outputConstraint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Respond with ONLY the requested output. No step-by-step reasoning. No numbered analysis. No preamble. Go straight to the answer.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;emitsThinkBlocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;bestTaskTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;

&lt;span class="c1"&gt;// Qwen Coder: focused, needs minimal constraint&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/qwen3.*coder|qwen.*coder/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;codeTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chatTemp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;outputConstraint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Be direct. Output only what was asked for.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;emitsThinkBlocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;bestTaskTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;outputConstraint&lt;/code&gt; string gets injected into the system prompt before every delegation call. Without it, GLM-4 would generate something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let me analyze this code step by step.

Step 1: First, I'll examine the function signature...
Step 2: Next, I'll consider the edge cases...
Step 3: Now I'll write the tests...

Here are the tests:
// actual tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the constraint, you get just the tests. That's 200+ tokens of preamble saved on every single call. Over a day of heavy delegation, that adds up.&lt;/p&gt;

&lt;p&gt;I tested this properly - ran the same twenty delegation calls with generic settings versus model-specific hints. The tuned version produced usable output on eighteen of them first try, although hallucination seemed to be a massive problem with GLM 4.7. The generic setup managed twelve. Six out of twenty calls needing a retry doesn't sound terrible, but each retry is another round-trip to the model and another chunk of context in the Claude conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance measurement: TTFT and tok/s
&lt;/h2&gt;

&lt;p&gt;Every response now includes timing data in the footer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model: qwen/qwen3-coder-next | 145→248 tokens (38 tok/s, 340ms TTFT) | Session: 12,450 offloaded across 23 calls
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TTFT (time to first token) and tokens per second, measured from the SSE stream. A slower model was running at 12 tok/s on certain prompts - painfully slow for something that normally hits 100-150. The TTFT metric made the problem obvious: the model was spending 8 seconds thinking before it started generating, which meant it was doing extended reasoning (the think blocks again) on prompts that shouldn't have needed it.&lt;/p&gt;

&lt;p&gt;Turned out it was temperature. At 0.3, certain prompts triggered the model's reasoning mode, so dropping to 0.1 for code tasks fixed it. Without the TTFT data, I'd probably still be wondering why some calls take 15 seconds and others take 2.&lt;/p&gt;

&lt;p&gt;The session totals (tokens offloaded, call count) serve a different purpose. Watching the counter climb to 40,000 offloaded tokens in a heavy coding day made the savings seem tangible rather than theoretical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;TypeScript, distributed via npm. Uses &lt;code&gt;sql.js&lt;/code&gt; (WASM) for the model cache, SSE streaming for the 55-second soft timeout, and the MCP SDK for the protocol layer. Apache-2.0 licence.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/@houtini/lm" rel="noopener noreferrer"&gt;@houtini/lm&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/houtini-ai/lm" rel="noopener noreferrer"&gt;houtini-ai/lm&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building MCP servers and running into similar problems with model determinism, I'd genuinely like to hear what patterns you've landed on. The routing and prompt hints system works for my setup but I'm under no illusion it's the only approach. PRs welcome - especially model profiles for families I haven't tested.&lt;/p&gt;

&lt;p&gt;PS: incase I didn't mention, you can use this to add any external cloud model with an OpenAI compatible API endpoint. Comments welcome - it'd be nice to see this through to being fully useful and a genuine sidekick for Claude.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>discuss</category>
      <category>llm</category>
    </item>
    <item>
      <title>SQLite as an MCP context saver: stop cramming raw API data into your LLM</title>
      <dc:creator>Richard Baxter</dc:creator>
      <pubDate>Wed, 04 Mar 2026 14:40:33 +0000</pubDate>
      <link>https://forem.com/richardbaxter/sqlite-as-an-mcp-context-saver-stop-cramming-raw-api-data-into-your-llm-2oj4</link>
      <guid>https://forem.com/richardbaxter/sqlite-as-an-mcp-context-saver-stop-cramming-raw-api-data-into-your-llm-2oj4</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most MCP servers dump raw API responses into the conversation. I've been using SQLite as a dependency in my MCP to sync data from my Google Search Console account locally and query it with SQL - here's the pattern and a working implementation for Google Search Console.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm fascinated by the utility and insight an LLM can provide, provided the data is presented correctly. You're not going to get great results from consumer-grade AI with 100,000 lines of JSON. A SQL query to a populated local database, however, is a different matter. Today's post is the why and how I use SQLite to give Claude a fighting chance of doing analysis, accurately, without the context bloat or hallucination.&lt;/p&gt;

&lt;p&gt;I've been shipping &lt;a href="https://github.com/houtini-ai" rel="noopener noreferrer"&gt;MCP servers since late 2025&lt;/a&gt;. Every time I hook one up to an API with any volume of data, without paging/chunk management the same problem shows up. You call the API, and if you're not limiting the rquest you get &lt;em&gt;all the rows&lt;/em&gt; back. Claude, or your LLM of choice dutifully analyses the data as if its the whole story. But a decent site generates tens of thousands of query/page combinations per month - so that 1,000-row response is actually a small sample, and even that might be too much to deal with efficiently.&lt;/p&gt;

&lt;p&gt;MCP gives AI models access to APIs - we know this. But most APIs spit back volumes of data that don't fit in a conversation, or much of the data is irrelevant to answering the question. I found that after a couple of GSC API calls, about 80% of the token use was wasted on raw data, leaving maybe 20% for the model to actually think with. The analysis that comes back is built on incomplete data, but it reads with full confidence - and that's a tricky combination to work with.&lt;/p&gt;

&lt;h2&gt;
  
  
  My approach
&lt;/h2&gt;

&lt;p&gt;The fix is less about prompting and more about architecture.&lt;/p&gt;

&lt;p&gt;Two phases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntnb4xu86akw6ttc917u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntnb4xu86akw6ttc917u.jpg" alt="Comparison of the usual API-to-LLM approach versus the SQLite pattern" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sync&lt;/strong&gt; - pull your complete dataset from the API into a local SQLite database as a dependency. No pagination caps, handle rate limits, retries, and deduplication in the background. Create your local data warehouse. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query&lt;/strong&gt; - your MCP tools translate prompts to SQL queries, defined in the tool description. The mcp returns aggregated results, not raw dumps.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So in my example, when you ask "which pages are losing traffic?", Claude runs a targeted query that returns maybe 20 rows instead of trying to work through 50,000 raw data points crammed into the conversation. It's like handing someone the analyst's summary rather than the raw spreadsheet.&lt;/p&gt;

&lt;p&gt;You get accuracy because your SQL hits the complete dataset, not a 1,000-row sample. And you get efficiency because you're sending 20 aggregated rows into the context window instead of 50,000 raw ones.&lt;/p&gt;

&lt;p&gt;I reckon this approach has legs beyond just Search Console. Sync locally, query with SQL, return aggregated results. It addresses the context window problem at an architectural level rather than hoping your API response fits.&lt;/p&gt;

&lt;p&gt;The thing that clicked for me building these is that your tools don't have to be thin API wrappers. You can build something that syncs an entire dataset in the background, builds indexes, prunes stale data, and then exposes simple query endpoints. Your model couldn't care less about pagination or rate limits or auth tokens - it asks a question and gets something useful back. Once you reach that point, it's not a party trick - it's just infrastructure.&lt;/p&gt;

&lt;p&gt;MCP is a lot more powerful than some of us might realise! &lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood: Better Search Console
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/houtini-ai/better-search-console" rel="noopener noreferrer"&gt;Better Search Console&lt;/a&gt; as a working implementation of this pattern, specifically for Google Search Console data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uii1h5hmhuyylrsy9cc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uii1h5hmhuyylrsy9cc.jpg" alt="Architecture diagram showing how Better Search Console syncs GSC data through SQLite" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The sync engine
&lt;/h3&gt;

&lt;p&gt;The Google Search Console API caps you at 1,000 rows per request and 25,000 rows per date range. For anything beyond a hobby blog, that's nowhere near enough. So the sync engine breaks your date range into 90-day chunks and fetches them in parallel - three concurrent streams, each paginating through its chunk until the API returns fewer rows than requested.&lt;/p&gt;

&lt;p&gt;You can trigger this as a CLI command if it's a particularly massive job, just by the way. &lt;/p&gt;

&lt;p&gt;Writes are serialised though. SQLite doesn't love concurrent writes, so all three streams funnel into a single write queue. Every row hits the database via &lt;code&gt;INSERT OR REPLACE&lt;/code&gt; keyed on date + query + page + device + country.&lt;/p&gt;

&lt;p&gt;I threw 800K rows at it (six months from a client's property) and the initial sync took about four minutes. After that, syncs are incremental - it picks up from where it left off.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQLite config
&lt;/h3&gt;

&lt;p&gt;Default SQLite settings are fairly conservative, so for this workload I've adjusted a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WAL mode&lt;/strong&gt; - write-ahead logging so reads don't block writes during sync&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;64MB cache&lt;/strong&gt; - keeps hot pages in memory instead of hitting disk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4GB memory-mapped I/O&lt;/strong&gt; - lets SQLite memory-map the database file directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 covering indexes&lt;/strong&gt; - one for each common query pattern (query lookups, page lookups, date ranges, country breakdowns, the lot)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is that queries against 800K rows come back in under a second. SQLite with proper indexes is surprisingly fast for this kind of analytical work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data retention
&lt;/h3&gt;

&lt;p&gt;Left unchecked, six months of GSC data for a busy site can balloon to millions of rows. The server auto-prunes after every sync: keeps the last 90 days in full, then drops zero-click rows from older data. The thinking is - if a query/page combination got zero clicks three months ago, it's probably noise, but anything with actual clicks stays around indefinitely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query sandboxing
&lt;/h3&gt;

&lt;p&gt;I spent longer on this than anything else. The &lt;code&gt;query_gsc_data&lt;/code&gt; tool lets Claude write freeform SQL against your data, which is powerful - but you obviously don't want it running &lt;code&gt;DROP TABLE&lt;/code&gt; or &lt;code&gt;DELETE FROM&lt;/code&gt;. So every query gets regex-checked before execution, and anything that matches INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, or ATTACH gets blocked. There's also a 10,000-row automatic LIMIT on queries that don't specify one, so a careless &lt;code&gt;SELECT *&lt;/code&gt; can't eat your entire context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tools
&lt;/h3&gt;

&lt;p&gt;Twelve tools in total. Four get about 90% of my use. The first two will trigger an mcp app to render charts etc with vite (really cool if you're using Claude Desktop). &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;get_overview&lt;/code&gt;&lt;/strong&gt; - shows all your properties at a glance with clicks, impressions, CTR, average position, and sparkline trends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;get_dashboard&lt;/code&gt;&lt;/strong&gt; - deep dive on one property. Hero metrics, comparison periods, top queries, country breakdown, ranking distribution, new/lost queries, branded vs non-branded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;get_insights&lt;/code&gt;&lt;/strong&gt; - sixteen pre-built analytical queries. The one I keep reaching for is &lt;code&gt;opportunities&lt;/code&gt;: it surfaces queries where you're ranking 5-20 with decent impression counts. You're close enough to page one that a title tweak or a content refresh could push you over. These tend to be the lowest-effort wins in SEO.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;query_gsc_data&lt;/code&gt;&lt;/strong&gt; - freeform SQL against the &lt;code&gt;search_analytics&lt;/code&gt; table with the sandboxing I described above.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one kicked the whole project off for me - being able to ask anything of the data and get a real answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL queries I've been running for SEO analysis
&lt;/h2&gt;

&lt;p&gt;The schema is nice and simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;search_analytics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;position&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran every one of these on a client's ecommerce site last Tuesday - 340K rows in the database - so they're well tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find long-tail referring keywords (5+ words):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is probably my favourite query in the set. Longer phrases tend to be conversational - they read like something you'd type into ChatGPT or ask Google's AI Overview. Finding which of these already send you traffic tells you exactly what questions your content is answering, and more importantly, which questions it isn't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;position&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;avg_position&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_analytics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-90 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;LENGTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;LENGTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;LENGTH&lt;/code&gt; trick counts spaces to filter for five-or-more-word phrases. Not pretty SQL, but it works in SQLite which doesn't have a native &lt;code&gt;WORD_COUNT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spot keyword cannibalisation&lt;/strong&gt; (multiple pages ranking for the same query):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_analytics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I find cannibalisation on almost every client audit I do. Two pages scrapping over the same query, neither ranking properly, and nobody on the team has noticed. This surfaces the worst offenders so you can decide whether to consolidate or differentiate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content decay detection&lt;/strong&gt; (pages losing traffic month-over-month):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-28 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-56 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-29 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;prior&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_analytics&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;prior&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;prior&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Questions people are asking about your content:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;position&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;avg_position&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_analytics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-90 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'how %'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'what %'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'why %'&lt;/span&gt;
       &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'can %'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'does %'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'is %'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've used this one on sites when building FAQ sections. If you're getting impressions for "how to configure X" but no clicks, your existing content probably doesn't answer that question clearly enough - or at all. Once you can see them, they're straightforward fixes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Country breakdown for international targeting:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;impressions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;position&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;avg_position&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_analytics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-90 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;clicks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nice thing about all of these is that you're querying the complete dataset. Everything sits in SQLite and every query hits every row - no 1,000-row cap, no pagination limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interactive dashboards with ext-apps
&lt;/h2&gt;

&lt;p&gt;Something else I got to explore on this project: MCP's &lt;a href="https://modelcontextprotocol.io/docs/extensions/apps" rel="noopener noreferrer"&gt;ext-apps framework&lt;/a&gt;, which lets you render interactive HTML directly inside Claude Desktop. Actual clickable dashboards with real data, living right there in the conversation as embedded iframes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17z4ooc9lpr58pwevsrd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17z4ooc9lpr58pwevsrd.jpg" alt=" " width="800" height="881"&gt;&lt;/a&gt;&lt;br&gt;
The overview, the property drilldowns, the sparkline trends - they're all interactive. It feels like the beginning of something interesting to me: MCP-native SaaS, where you build the backend as an MCP server, the frontend as ext-apps, and distribute via &lt;code&gt;npx&lt;/code&gt;. No hosting to maintain, no auth flows to build, no deployment pipeline to manage. And your data stays on your machine, which for SEO data is a nice bonus.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting it up
&lt;/h2&gt;

&lt;p&gt;Three steps. The Google credentials bit takes the longest but you only do it once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Google service account&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Head to &lt;a href="https://console.cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt;, create a new project and enable the Search Console API&lt;/li&gt;
&lt;li&gt;Credentials → Create service account → Keys tab → Add Key → JSON&lt;/li&gt;
&lt;li&gt;Grab the &lt;code&gt;client_email&lt;/code&gt; value from that JSON file&lt;/li&gt;
&lt;li&gt;In &lt;a href="https://search.google.com/search-console/" rel="noopener noreferrer"&gt;Search Console&lt;/a&gt; → Settings → Users and permissions → Add the email with &lt;strong&gt;Full&lt;/strong&gt; permission&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last step is the one everyone misses (I did too, first time). The service account is basically a separate Google identity - it can't see your properties until you explicitly share them. Skip it and you'll get "No properties found" errors wondering what went wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Claude Desktop config&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Add this to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"better-search-console"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@houtini/better-search-console"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"GOOGLE_APPLICATION_CREDENTIALS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/your-service-account.json"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Restart and sync&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Restart Claude Desktop and ask it to set up Better Search Console. The &lt;code&gt;setup&lt;/code&gt; tool discovers your properties, syncs them in the background, and shows an overview. Initial sync depends on data volume - 30 seconds for a small site, a few minutes for one with millions of rows. After that, syncs are incremental.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;Built in TypeScript, distributed via npm. Uses &lt;code&gt;better-sqlite3&lt;/code&gt; for storage, full pagination against the Google Search Console API, and MCP ext-apps for the interactive dashboard bits. Apache-2.0 licence - fork it, break it, whatever works for you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/@houtini/better-search-console" rel="noopener noreferrer"&gt;@houtini/better-search-console&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/houtini-ai/better-search-console" rel="noopener noreferrer"&gt;houtini-ai/better-search-console&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I reckon the sync-locally-query-with-SQL approach has legs well beyond Search Console. Better answers from less context - that's genuinely all there is to it.&lt;/p&gt;

&lt;p&gt;Pull requests welcome - I'm especially interested in SQL queries you've found useful for SEO analysis, and I'll add the good ones to the pre-built insights.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
