<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Donnyb369 </title>
    <description>The latest articles on Forem by Donnyb369  (@donnyb369422e67b98e4b668da).</description>
    <link>https://forem.com/donnyb369422e67b98e4b668da</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883179%2F494a4589-cb53-4f8a-a54f-7dfd8c1f33f7.png</url>
      <title>Forem: Donnyb369 </title>
      <link>https://forem.com/donnyb369422e67b98e4b668da</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/donnyb369422e67b98e4b668da"/>
    <language>en</language>
    <item>
      <title>I Built the Middleware Layer MCP is Missing</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:02:52 +0000</pubDate>
      <link>https://forem.com/donnyb369422e67b98e4b668da/i-built-the-middleware-layer-mcp-is-missing-eo</link>
      <guid>https://forem.com/donnyb369422e67b98e4b668da/i-built-the-middleware-layer-mcp-is-missing-eo</guid>
      <description>&lt;p&gt;Every MCP tutorial shows the same thing: connect Claude to your filesystem, your database, your GitHub. Five servers, 57 tools, infinite power.&lt;/p&gt;

&lt;p&gt;Nobody talks about what happens next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problems Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Token waste.&lt;/strong&gt; With 40+ tools loaded, you're burning thousands of tokens on JSON schemas every turn. Before Claude even reads your question, it's consumed half its context window on tool definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context rot.&lt;/strong&gt; In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it edits the old version — silently overwriting your latest changes. You don't notice until the code breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero security boundary.&lt;/strong&gt; MCP servers run with full access. No audit trail. No rate limits. No secret scrubbing. Your GitHub token shows up in logs. There's nothing between the LLM and your tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No compliance layer.&lt;/strong&gt; Claude wants to read Slack? Hope you're okay with it seeing your DMs with your boss. There's no way to filter what reaches the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Spine: One Proxy, Full Control
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;MCP Spine&lt;/a&gt; — a local-first middleware proxy that sits between your LLM client and your MCP servers. One config file, one entry point in &lt;code&gt;claude_desktop_config.json&lt;/code&gt;, and everything routes through it.&lt;/p&gt;

&lt;p&gt;Here's what it does:&lt;/p&gt;

&lt;h3&gt;
  
  
  61% Token Savings
&lt;/h3&gt;

&lt;p&gt;The schema minifier strips unnecessary fields from tool definitions — &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;additionalProperties&lt;/code&gt;, verbose descriptions, defaults. Level 2 cuts token usage by 61% with zero information loss.&lt;/p&gt;

&lt;h3&gt;
  
  
  State Guard Stops Context Rot
&lt;/h3&gt;

&lt;p&gt;Spine watches your project files, tracks SHA-256 hashes, and injects version pins into every tool response. When Claude has a stale cached version, the pin tells it to re-read. Context rot solved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security That Actually Works
&lt;/h3&gt;

&lt;p&gt;Rate limiting (per-tool and global), path traversal jails, secret scrubbing (AWS keys, GitHub tokens, private keys), HMAC-fingerprinted audit trails, and circuit breakers on failing servers. Defense-in-depth — every layer assumes the others might fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plugin System for Compliance
&lt;/h3&gt;

&lt;p&gt;Drop-in Python plugins hook into the tool call pipeline. The included Slack filter example strips messages from sensitive channels before the LLM ever sees them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;spine.plugins&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SpinePlugin&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SlackFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SpinePlugin&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack-filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;deny_channels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr-private&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exec-salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_tool_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
        &lt;span class="c1"&gt;# Filter out messages from denied channels
&lt;/span&gt;        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Everything Else
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic routing&lt;/strong&gt; with local embeddings (no API calls) — only relevant tools reach the LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop&lt;/strong&gt; confirmation for destructive tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budget&lt;/strong&gt; tracking with daily limits and warn/block enforcement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config hot-reload&lt;/strong&gt; — edit your config while Spine is running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-user audit&lt;/strong&gt; with session-tagged entries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three transports&lt;/strong&gt;: stdio, SSE, and Streamable HTTP (MCP 2025-03-26)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive setup wizard&lt;/strong&gt; (&lt;code&gt;mcp-spine init&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
mcp-spine init
mcp-spine doctor &lt;span class="nt"&gt;--config&lt;/span&gt; spine.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add one entry to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"spine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spine.cli"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/spine.toml"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Battle-Tested on Windows
&lt;/h2&gt;

&lt;p&gt;Most MCP tooling assumes macOS. Spine is battle-tested on Windows with MSIX sandbox paths, &lt;code&gt;npx.cmd&lt;/code&gt; resolution, paths with spaces and parentheses, environment variable merging, and unbuffered stdout to prevent pipe hangs. It also runs on macOS and Linux.&lt;/p&gt;

&lt;p&gt;190+ tests, CI on Windows + Linux across Python 3.11-3.13.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;https://github.com/Donnyb369/mcp-spine&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;code&gt;pip install mcp-spine&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Glama: AAA score&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;What security or compliance problems are you running into with MCP? I'd love to hear what features would be most useful.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>python</category>
      <category>ai</category>
      <category>security</category>
    </item>
    <item>
      <title>I routed 60 MCP tools through a single proxy — here's what I learned about token waste and security</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:05:54 +0000</pubDate>
      <link>https://forem.com/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</link>
      <guid>https://forem.com/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</guid>
      <description>&lt;p&gt;I've been building MCP servers for Claude Desktop for a few months now. At one point I had five servers running: filesystem, GitHub, SQLite, a knowledge graph, and Brave Search. Sixty tools total, all piped into one LLM.&lt;/p&gt;

&lt;p&gt;It worked. But three things kept going wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token problem
&lt;/h2&gt;

&lt;p&gt;Every time Claude makes a tool call, it sends the full schema of every available tool in the context window. Sixty tools means sixty JSON schema definitions, every single request. I measured it: &lt;strong&gt;over 4,800 tokens of schema overhead per request&lt;/strong&gt;, before Claude even starts thinking about your question.&lt;/p&gt;

&lt;p&gt;That's money. At API rates, those wasted tokens add up fast across a workday of tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The security problem
&lt;/h2&gt;

&lt;p&gt;I found out the hard way that my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; was passing environment variables to child processes — and a bug in how I was merging env vars meant the entire system PATH, including tokens and API keys, was getting passed through. One of my GitHub tokens ended up in a log file. Twice.&lt;/p&gt;

&lt;p&gt;MCP servers run as child processes with whatever permissions your user account has. There's no audit trail, no rate limiting, no secret scrubbing. If a tool call returns sensitive data, it goes straight into the LLM context with no filtering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The context rot problem
&lt;/h2&gt;

&lt;p&gt;Claude would read a file, modify it three tool calls later, then reference the stale version from its context. The file had changed on disk but Claude was still working with the old content. I called this "context rot" — the LLM's view of the world drifts from reality over a long session.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built a proxy
&lt;/h2&gt;

&lt;p&gt;MCP Spine sits between Claude Desktop and all your MCP servers. One proxy, one connection, all traffic flows through it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Desktop ◄──stdio──► MCP Spine ◄──stdio──► filesystem
                                      ◄──stdio──► GitHub
                                      ◄──stdio──► SQLite
                                      ◄──stdio──► memory
                                      ◄──stdio──► Brave Search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what it does at each layer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security proxy&lt;/strong&gt; — validates every JSON-RPC message, scrubs secrets from tool outputs (AWS keys, GitHub tokens, bearer tokens, private keys, connection strings), rate limits tool calls, blocks command injection and path traversal, and writes an HMAC-fingerprinted audit trail to SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema minifier&lt;/strong&gt; — strips verbose descriptions, defaults, and metadata from tool schemas before they reach the LLM. The type information and required fields stay intact. Real measured savings on 12 representative tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0 (off)&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 (light)&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 (default)&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The best individual tool (&lt;code&gt;read_file&lt;/code&gt;) went from 586 characters down to 242 — a 59% reduction. The savings compound: with 60 tools, Level 2 saves roughly 1,500 tokens per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State guard&lt;/strong&gt; — watches files on disk with SHA-256 hashes. When Claude references a file that's changed since it last read it, Spine injects a version pin into the response: "this file has changed since you last saw it." No more context rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic router&lt;/strong&gt; — uses local embeddings (ChromaDB + MiniLM) to figure out which tools are relevant to the current task. Instead of showing all 60 tools, it shows the 5-10 that matter. This is optional and currently experimental — the ML dependencies add startup time, so I made them lazy-loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned building it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Environment variable handling is a minefield.&lt;/strong&gt; The biggest bug I hit was &lt;code&gt;env=self.config.env or None&lt;/code&gt; in the subprocess spawn. When a server config had custom env vars (like &lt;code&gt;GITHUB_TOKEN&lt;/code&gt;), this replaced the entire process environment instead of extending it. Every server that needed a custom env var was silently missing &lt;code&gt;PATH&lt;/code&gt;, &lt;code&gt;HOME&lt;/code&gt;, and everything else. The fix was one line: &lt;code&gt;{**os.environ, **self.config.env}&lt;/code&gt;. But it took hours to diagnose because the error messages were about missing executables, not missing env vars.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows is a different world.&lt;/strong&gt; Python's asyncio on Windows uses a Proactor event loop that can't do &lt;code&gt;connect_read_pipe&lt;/code&gt; / &lt;code&gt;connect_write_pipe&lt;/code&gt; on stdio handles from piped processes. The workaround is raw binary I/O with &lt;code&gt;run_in_executor&lt;/code&gt; for reads. I also had to handle paths with spaces and parentheses (my project lives in &lt;code&gt;MCP (The Spine)&lt;/code&gt;), UNC paths, and the MSIX sandbox that Claude Desktop runs in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;npx is slow, node is fast.&lt;/strong&gt; Spawning MCP servers via &lt;code&gt;npx @modelcontextprotocol/server-github&lt;/code&gt; takes 10-15 seconds because npx checks for updates every time. Switching to &lt;code&gt;node C:\path\to\node_modules\...\dist\index.js&lt;/code&gt; connects in under a second. This matters because MCP clients have handshake timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thread safety in audit logging is easy to get wrong.&lt;/strong&gt; The semantic router runs a background thread for model loading. That thread calls the audit logger, which tries to use a SQLite connection created in the main thread. SQLite doesn't allow cross-thread connection sharing. Fix: &lt;code&gt;check_same_thread=False&lt;/code&gt; plus a &lt;code&gt;threading.Lock()&lt;/code&gt; around all DB operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Running on Windows with Python 3.14 and Claude Desktop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 MCP servers connected through one proxy&lt;/li&gt;
&lt;li&gt;60 tools total, routed and minified&lt;/li&gt;
&lt;li&gt;32% average schema token savings (up to 59% on verbose tools)&lt;/li&gt;
&lt;li&gt;135+ tests, CI green on Windows + Linux&lt;/li&gt;
&lt;li&gt;Sub-second server connections (with node direct path)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure your servers in a TOML file, point Claude Desktop at Spine, and all your MCP traffic gets security hardening, token savings, and an audit trail.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;github.com/Donnyb369/mcp-spine&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/mcp-spine" rel="noopener noreferrer"&gt;pypi.org/project/mcp-spine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's open source, local-first, and works on Windows and Linux. No cloud, no accounts, no telemetry.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm an independent developer building open-source MCP tooling. If you're using MCP servers with Claude Desktop or any other LLM client, I'd love to hear what problems you're hitting. Drop a comment or open an issue on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>llm</category>
      <category>mcp</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
