<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Wojciech Wentland</title>
    <description>The latest articles on Forem by Wojciech Wentland (@desty2k).</description>
    <link>https://forem.com/desty2k</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866520%2F3ad33efb-4ece-434a-97e5-e9cecc7cc9fd.png</url>
      <title>Forem: Wojciech Wentland</title>
      <link>https://forem.com/desty2k</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/desty2k"/>
    <language>en</language>
    <item>
      <title>Why I only build read-only MCP servers</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:43:42 +0000</pubDate>
      <link>https://forem.com/desty2k/why-i-only-build-read-only-mcp-servers-32kl</link>
      <guid>https://forem.com/desty2k/why-i-only-build-read-only-mcp-servers-32kl</guid>
      <description>&lt;p&gt;Every MCP server I build is read-only. List, search, get, read. No create, update, delete, activate, purge.&lt;/p&gt;

&lt;p&gt;I've been running Claude Code with &lt;a href="https://code.claude.com/docs/en/permission-modes" rel="noopener noreferrer"&gt;&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;&lt;/a&gt; in environments where the agent has no write-capable MCP tools and no direct path to mutate production systems. I haven't had a single unwanted action against a production system in months. Not because I trust the model to never hallucinate. Because the tools it has access to can't turn a hallucinated action into a real API write.&lt;/p&gt;

&lt;p&gt;Read-only doesn't make an agent safe. It removes an entire class of failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode isn't hypothetical
&lt;/h2&gt;

&lt;p&gt;There's a &lt;a href="https://www.reddit.com/r/ClaudeCode/comments/1sex28q/opus_46_destroys_a_users_session_costing_them/" rel="noopener noreferrer"&gt;post on r/ClaudeCode&lt;/a&gt; where Claude suggested tearing down a GPU instance, then executed it. The user never confirmed. The model said "tear down the H100 too," treated its own suggestion as user confirmation, and destroyed a running instance with hours of cached build artifacts and compiled kernels on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbgm60xs9ixdsdo019m8.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbgm60xs9ixdsdo019m8.webp" alt="Claude hallucinated user confirmation and destroyed a running GPU instance. Source: r/ClaudeCode" width="800" height="739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model later admitted: "I hallucinated you saying that. You never said those words. I said it, then executed it as if you'd agreed."&lt;/p&gt;

&lt;p&gt;If that agent had read-only tools, it would have read the instance list, maybe suggested tearing something down, and then... nothing. The suggestion dies as text. No one loses a machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I actually use agents
&lt;/h2&gt;

&lt;p&gt;My workflow with Claude Code looks like this: I ask it to investigate something. It reads logs, searches code, pulls data from MCP servers, and comes back with an analysis. If the analysis leads to an action — creating a Jira ticket, updating a config, deploying a change — Claude drafts it. I review the draft, then I do the action myself.&lt;/p&gt;

&lt;p&gt;The agent reads and analyzes. I act.&lt;/p&gt;

&lt;p&gt;I trust the model's judgment on what to write in a ticket. The problem is it sometimes hallucinates that I asked it to do something I didn't. If the tool is read-only, the worst that happens is it reads data it was going to read anyway. If the tool has write access, the worst that happens is the Reddit post above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approval fatigue is the real problem
&lt;/h2&gt;

&lt;p&gt;"But there's a confirmation prompt before destructive actions." Sure. Claude Code asks before running commands. The problem is approval fatigue. After confirming 50 read operations, you stop reading the prompts. You click yes. And then the 51st one is &lt;code&gt;vastai destroy instance 34122719&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Anthropic wrote about this in their &lt;a href="https://www.anthropic.com/engineering/claude-code-sandboxing" rel="noopener noreferrer"&gt;sandboxing post&lt;/a&gt;. They found that constant permission prompts paradoxically reduce security because users stop paying attention. Their solution was sandboxing: restrict what the agent can access so you don't need to ask as often. They reduced permission prompts by 84% while maintaining security.&lt;/p&gt;

&lt;p&gt;Read-only MCP servers follow the same logic. If the server can't write, you don't need to confirm writes. The agent operates freely within the read boundary. No fatigue, no missed confirmation on a destructive action.&lt;/p&gt;

&lt;p&gt;That's why I run &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;. It sounds reckless until you realize the agent's entire toolkit is read-only. There's nothing dangerous to skip permission for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this doesn't cover
&lt;/h2&gt;

&lt;p&gt;Read-only MCP servers are one boundary, not a complete agent security model. If you also give the agent bash access, cloud CLIs, kubectl, or production credentials through other channels, this design won't save you. Claude Code with &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; can still run shell commands, edit files, and interact with whatever's reachable from the host. Anthropic's own documentation &lt;a href="https://code.claude.com/docs/en/permission-modes" rel="noopener noreferrer"&gt;recommends&lt;/a&gt; using isolated environments when running in bypass mode, and their sandboxing approach combines filesystem isolation, network restrictions, and permission controls — not just tool-level restrictions.&lt;/p&gt;

&lt;p&gt;This article is about the MCP boundary specifically. For me, that boundary matters because my agents talk to external systems almost exclusively through MCP. But it's one layer, not the whole stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the IDE
&lt;/h2&gt;

&lt;p&gt;There's another reason I care about read-only MCP servers: they're portable. My workflow is Claude Code today, but the same servers work in any agent system that speaks MCP.&lt;/p&gt;

&lt;p&gt;In a headless agent system — one where there's no human in the loop and no bash shell — the MCP boundary isn't just one layer. It's the only interface the agent has to external systems. If every MCP server it can reach is read-only, the agent literally cannot mutate production state. No sandboxing needed, no permission prompts, no approval fatigue. The tools themselves are the guardrail.&lt;/p&gt;

&lt;p&gt;This matters if you're building agent systems for other users. Giving all users read access to your CDN config, build logs, or DNS records is usually fine. Giving all users write access is a different conversation entirely. Read-only MCP servers let you expose data to agents at scale without worrying about what happens when one of them hallucinates an action.&lt;/p&gt;

&lt;h2&gt;
  
  
  What read-only servers are good for
&lt;/h2&gt;

&lt;p&gt;I run MCP servers for CDN management, CI/CD, log aggregation, DNS, and incident management. All read-only. The questions I ask look like: "What's the current CDN config for checkout?" "Which build failed last night?" "Compare caching rules between production and staging." "Draft a Jira ticket for the DNS change we discussed."&lt;/p&gt;

&lt;p&gt;Claude produces the draft text. I copy it into Jira or GitHub myself. Nothing in this workflow needs the agent to write to the target system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The credential argument
&lt;/h2&gt;

&lt;p&gt;Getting a read-only API credential approved is a conversation. "I need read access to the CDN config API for an AI assistant that helps engineers investigate issues." Most teams say yes.&lt;/p&gt;

&lt;p&gt;Getting a write credential is different. "I need an AI agent to be able to modify CDN configurations." That's a meeting, a security review, a discussion about rollback procedures, and probably a "no" or a "let's revisit in Q3."&lt;/p&gt;

&lt;p&gt;Read-only credentials have a smaller blast radius and a simpler approval process. They also happen to cover every use case I actually have.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for MCP servers
&lt;/h2&gt;

&lt;p&gt;Every MCP server I publish follows this: read-only by design. The &lt;a href="https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices" rel="noopener noreferrer"&gt;MCP security best practices&lt;/a&gt; describe scope minimization as a core principle. Start with the minimum privileges, elevate only when required. My servers don't elevate.&lt;/p&gt;

&lt;p&gt;If someone opens a GitHub issue asking for write tools, the answer is: "This server is intentionally read-only. Fork it if you need write operations." That's not laziness. It's a design decision about what I want an AI agent to be able to do when it hallucinates an action at 3am.&lt;/p&gt;

&lt;p&gt;I'm planning a series of production-ready read-only MCP servers for various platforms. More on that soon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Your MCP server is not an API adapter</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Wed, 08 Apr 2026 19:59:13 +0000</pubDate>
      <link>https://forem.com/desty2k/your-mcp-server-is-not-an-api-adapter-23k7</link>
      <guid>https://forem.com/desty2k/your-mcp-server-is-not-an-api-adapter-23k7</guid>
      <description>&lt;p&gt;A lot of &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; servers I see in the wild look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_thing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/things/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fetch, forward, done. A thin HTTP proxy with a JSON Schema wrapper. For some&lt;br&gt;
use cases, that's enough.&lt;/p&gt;

&lt;p&gt;The servers I keep coming back to do something different. They hold state and&lt;br&gt;
pre-compute answers. An agent hitting a thin wrapper might need three round&lt;br&gt;
trips and 30 seconds. The same agent hitting a server that does real work gets&lt;br&gt;
its answer in one call, under a millisecond.&lt;/p&gt;
&lt;h2&gt;
  
  
  Preloaded in-memory index
&lt;/h2&gt;

&lt;p&gt;Here's a failure mode I run into constantly: the agent needs to find something&lt;br&gt;
but doesn't know the exact ID. Most APIs only support exact lookups. No ID, no&lt;br&gt;
result. The conversation dead-ends with "I couldn't find that resource" and the&lt;br&gt;
user gives up.&lt;/p&gt;

&lt;p&gt;I built a server that wraps a CDN management API. Hundreds of properties, and&lt;br&gt;
the agent regularly needs to find which one handles a given hostname. The API&lt;br&gt;
has a search endpoint, but it's slow, requires exact matches, and sometimes&lt;br&gt;
returns 403 depending on account permissions.&lt;/p&gt;

&lt;p&gt;So the server loads every property into memory at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PropertyIndex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PropertyEntry&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_name_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;refresh_interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_refresh_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_refresh_loop&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;property_name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scorer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fuzz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WRatio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score_cutoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Builds once by fanning out parallel API calls, deduplicates, refreshes every&lt;br&gt;
five minutes in the background. Lookups take under a millisecond.&lt;/p&gt;

&lt;p&gt;Without this, the agent guesses at exact property names, picks the wrong one,&lt;br&gt;
retries, burns three turns. With the index, someone types "the CDN config for&lt;br&gt;
checkout" and gets the right answer first try. That's the kind of difference&lt;br&gt;
that decides whether people keep using the agent or go back to doing it&lt;br&gt;
manually.&lt;/p&gt;

&lt;p&gt;I did the same thing for a CI/CD server. The API lets you fetch a build config&lt;br&gt;
by ID, but there's no fuzzy search. If you don't know the ID, you're stuck. The&lt;br&gt;
server caches all build configurations at startup, runs fuzzy matching against&lt;br&gt;
them. The agent says "find the deploy job for the payments service" and gets a&lt;br&gt;
ranked list instantly, even though the CI system itself can't do that.&lt;/p&gt;
&lt;h2&gt;
  
  
  Embedded analytical database
&lt;/h2&gt;

&lt;p&gt;I have another server that sits in front of a relational database. Some tables&lt;br&gt;
have 20 million rows. The agent needs to answer analytical questions, things&lt;br&gt;
like "which providers have the highest volume in this region?" or "show me the&lt;br&gt;
top performers for a given category."&lt;/p&gt;

&lt;p&gt;The database wasn't designed for these queries. It was built for a web UI with&lt;br&gt;
narrow, well-indexed lookups. The agent's access patterns are different: it asks&lt;br&gt;
broad analytical questions that require joins across tables the application&lt;br&gt;
never joins. Adding indexes wasn't an option either, because the database is&lt;br&gt;
owned by another team and optimizing it for an AI agent's query patterns wasn't&lt;br&gt;
on anyone's roadmap. Some of these queries took 10 to 30 seconds on a read&lt;br&gt;
replica, and in an agent loop where that latency gets multiplied by however many&lt;br&gt;
tool calls the agent needs, the conversation times out before it gets anywhere.&lt;/p&gt;

&lt;p&gt;The server embeds &lt;a href="https://duckdb.org" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt; in-process and loads pre-aggregated views and lookup&lt;br&gt;
tables at startup. Some are straight copies of small reference tables. Others&lt;br&gt;
are materialized summaries that flatten joins the source database was never&lt;br&gt;
designed to run efficiently, the kind of cross-table aggregations that make&lt;br&gt;
sense for an analytical question but would be expensive on a schema built for&lt;br&gt;
transactional web UI lookups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DuckDBCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fast_configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_deferred_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_deferred&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deferred_configs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_refresh_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_refresh_loop&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each table has a fingerprint query (a cheap &lt;code&gt;COUNT(*)&lt;/code&gt; or checksum) that the&lt;br&gt;
refresh loop checks before doing a full reload. Large tables load in the&lt;br&gt;
background after the server is already taking requests. If something asks for a&lt;br&gt;
table that hasn't loaded yet, it falls back to the source database.&lt;/p&gt;

&lt;p&gt;The 30-second query now takes under a millisecond. The agent can actually have a&lt;br&gt;
back-and-forth with the user instead of timing out after the first question.&lt;/p&gt;

&lt;p&gt;There's a query-result cache on top of this too. It has a prewarm manifest,&lt;br&gt;
basically a list of common queries that run at startup so the first person to&lt;br&gt;
use the agent on Monday morning doesn't sit through a cold start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_or_compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compute_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compute_fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_default_ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It skips caching error responses. If a query fails because the database is&lt;br&gt;
temporarily overloaded, you don't want that failure served for the next hour.&lt;br&gt;
That one took a production outage to figure out.&lt;/p&gt;
&lt;h2&gt;
  
  
  Data transformation
&lt;/h2&gt;

&lt;p&gt;Every server I build strips the upstream API response before returning it.&lt;br&gt;
Token usage scales with response size, and most APIs return 10x more data than&lt;br&gt;
the agent will ever look at.&lt;/p&gt;

&lt;p&gt;One API I work with returns objects with 60+ fields. The server keeps maybe 8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_slim_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_strip_nulls&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_cents_to_major&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_value_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_cents_to_major&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_value_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_effective_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;_cents_to_major&lt;/code&gt; converts cents to dollars. The raw API stores monetary values&lt;br&gt;
in cents. Before I added this conversion, 100% of the reports the agent&lt;br&gt;
generated had wrong numbers. Every dollar amount was off by a factor of 100. A $2,000 contract showed up as&lt;br&gt;
$200,000 in the report because the agent treated cents as dollars. No amount of prompt engineering fixed it reliably. Moving the conversion&lt;br&gt;
into the server did.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;_effective_status&lt;/code&gt; is the other one worth mentioning. The API's status field&lt;br&gt;
can say "active" on a record that ended three months ago. The platform's own UI&lt;br&gt;
derives the real status from multiple fields, so the MCP server does the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_effective_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terminated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not_renewed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inactive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date_not_applicable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;renewal_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;perpetual&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;undetermined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;end_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inactive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;undetermined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent gives the same answer a human would get looking at the UI.&lt;br&gt;
Stripping nulls across a list of 50 records also saves a few thousand tokens&lt;br&gt;
per response, which adds up.&lt;/p&gt;

&lt;p&gt;A log aggregation server I built does something similar: auto-appends&lt;br&gt;
&lt;code&gt;| json auto&lt;/code&gt; to queries that don't have a field extraction operator, truncates&lt;br&gt;
raw log lines to 500 characters, converts epoch-millisecond timestamps to&lt;br&gt;
ISO 8601. Small fixes that add up to the agent not wasting turns fighting the&lt;br&gt;
format.&lt;/p&gt;
&lt;h2&gt;
  
  
  Download once, serve from cache
&lt;/h2&gt;

&lt;p&gt;Some data is expensive to fetch. PDF documents. Code bundles in tgz archives.&lt;br&gt;
The pattern: download on first access, extract the text, build a line offset&lt;br&gt;
index, serve everything from memory after that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CachedFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;offsets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;offsets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_offsets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;offsets&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_lines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_offsets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_line_end_offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use this for CDN edge function code bundles and PDF documents (extracted with&lt;br&gt;
&lt;a href="https://pymupdf.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;PyMuPDF&lt;/a&gt;). After the first download, the agent reads by line range, searches&lt;br&gt;
with regex, lists the file tree. No repeat downloads. Reading through a&lt;br&gt;
200-page document becomes "just read" instead of "download, extract, read" on&lt;br&gt;
every question.&lt;/p&gt;

&lt;h2&gt;
  
  
  When thin is fine
&lt;/h2&gt;

&lt;p&gt;Not everything needs this treatment. A server that translates natural language&lt;br&gt;
to a query language and passes it to an API is fine as a thin wrapper. The&lt;br&gt;
translation is the value there. Same for simple lookup tools.&lt;/p&gt;

&lt;p&gt;The question I ask: does the agent hit the same data twice? Does the API return&lt;br&gt;
more than the agent needs? Is the API response time slow enough that the agent&lt;br&gt;
loop feels broken? If yes, the server should be doing work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The multiplier
&lt;/h2&gt;

&lt;p&gt;When a person uses a web UI, they look at a page, think, click something else.&lt;br&gt;
One request at a time, processed by a human brain. An agent works differently.&lt;br&gt;
It makes five tool calls, stuffs all five responses into its context window, and&lt;br&gt;
reasons over them at once. A slow response gets multiplied by every call. A&lt;br&gt;
60-field JSON blob gets multiplied by every call. It adds up fast.&lt;/p&gt;

&lt;p&gt;I've measured the difference. CDN property lookups went from three agent turns&lt;br&gt;
to one once the fuzzy index was in place. Analytical queries went from timing&lt;br&gt;
out at 30 seconds to returning in under a millisecond from DuckDB. And every&lt;br&gt;
single dollar amount in every report was wrong until the server started&lt;br&gt;
converting cents for the agent.&lt;/p&gt;

&lt;p&gt;You can try to fix that last one with prompt engineering. I tried for weeks. The agent still got it wrong often enough that I couldn't trust the output.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
