Forem: Donnyb369

MCP Spine v0.2.5: I Built a Full Middleware Stack for MCP Tool Calls

Donnyb369 — Sat, 25 Apr 2026 21:41:14 +0000

Last month I shipped MCP Spine v0.1 — a basic proxy that sat between Claude Desktop and MCP servers. It did schema minification and security basics.

Since then, it's grown into a full middleware stack. Here's everything in v0.2.5 and why each piece exists.

The Starting Point

57 tools. 5 servers. Claude Desktop config file with one entry pointing to Spine. Everything routes through the proxy.

pip install mcp-spine
mcp-spine init

The setup wizard detects your installed servers (npx, node, Python), asks what features you want, and writes a tailored config.

Schema Minification: 61% Fewer Tokens

Every tool call starts with the LLM reading tool schemas. With 57 tools, that's thousands of tokens before the conversation even begins.

Spine's minifier strips $schema, additionalProperties, parameter descriptions, titles, and defaults — keeping only what the LLM actually needs. Level 2 cuts 61% of schema tokens with zero information loss.

The web dashboard shows real-time savings:

State Guard: No More Stale Edits

In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it "edits" the old version — silently overwriting your current code.

State Guard watches your project files, computes SHA-256 hashes, and injects compact version pins into every tool response. When Claude's cached version doesn't match, it knows to re-read.

Prompt Injection Detection

This one surprised me. Tool responses can contain text that looks like instructions to the LLM — "ignore previous instructions", "[SYSTEM]", or encoded payloads.

Spine now scans every tool response for 8 categories of injection patterns before it reaches the model. Detections are logged as security events and can trigger webhook alerts to Slack or Discord.

# spine/injection.py detects:
# - System prompt overrides
# - Role injection ("you are now a...")
# - Instruction hijacking
# - Jailbreak attempts (DAN, developer mode)
# - Data exfiltration URLs
# - Base64-encoded payloads

Plugin System: The Compliance Layer

This is the feature I'm most excited about. Spine plugins are Python files that hook into the tool call pipeline:

from spine.plugins import SpinePlugin

class SlackFilter(SpinePlugin):
    name = "slack-filter"
    deny_channels = ["hr-private", "exec-salary"]

    def on_tool_response(self, tool_name, arguments, response):
        if "slack" not in tool_name:
            return response
        # Filter messages from denied channels
        content = response.get("content", [])
        filtered = [b for b in content
                    if not any(ch in b.get("text", "").lower()
                              for ch in self.deny_channels)]
        return {**response, "content": filtered}

Drop it in your plugins/ directory, enable in config, done. The LLM never sees messages from those channels.

Four hook points: on_tool_call (transform args or block calls), on_tool_response (filter responses), on_tool_list (hide tools), and lifecycle hooks.

Web Dashboard

Zero-dependency browser dashboard at localhost:8777:

mcp-spine web --db spine_audit.db

Shows tool calls, security events, token budget usage, schema token savings, server latency, request log, and client sessions. Auto-refreshes every 3 seconds.

Tool Response Caching

Read-only tools like read_file and list_directory often get called with the same arguments multiple times in a conversation. Spine now caches these responses:

[tool_cache]
enabled = true
cacheable_tools = ["read_file", "read_query", "list_directory"]
ttl_seconds = 300

Cache hits skip the downstream server call entirely. LRU eviction with TTL expiration.

Everything Else in v0.2.5

Token budget: daily limits, per-server limits, warn/block actions, persistent tracking, spine_budget meta-tool
Tool aliasing: create_or_update_file → edit_github_file
Config hot-reload: edit config while running, changes apply in seconds
Webhook notifications: Slack/Discord/JSON alerts on security events
Multi-user audit: session-tagged entries, mcp-spine audit --sessions
Analytics export: CSV/JSON with time and event filtering
Streamable HTTP: MCP 2025-03-26 transport support
Interactive wizard: mcp-spine init detects your setup
Latency monitoring: per-server tracking with degradation alerts

The Numbers

20 source files
190+ tests
CI on Windows + Linux, Python 3.11-3.13
AAA score on Glama
Approved on mcpservers.org
MIT licensed

Try It

pip install mcp-spine
mcp-spine init
mcp-spine doctor --config spine.toml
mcp-spine serve --config spine.toml
mcp-spine web --db spine_audit.db

GitHub: https://github.com/Donnyb369/mcp-spine

What would you build with a plugin system for MCP tool calls?

I Built the Middleware Layer MCP is Missing

Donnyb369 — Sun, 19 Apr 2026 15:02:52 +0000

Every MCP tutorial shows the same thing: connect Claude to your filesystem, your database, your GitHub. Five servers, 57 tools, infinite power.

Nobody talks about what happens next.

The Problems Nobody Mentions

Token waste. With 40+ tools loaded, you're burning thousands of tokens on JSON schemas every turn. Before Claude even reads your question, it's consumed half its context window on tool definitions.

Context rot. In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it edits the old version — silently overwriting your latest changes. You don't notice until the code breaks.

Zero security boundary. MCP servers run with full access. No audit trail. No rate limits. No secret scrubbing. Your GitHub token shows up in logs. There's nothing between the LLM and your tools.

No compliance layer. Claude wants to read Slack? Hope you're okay with it seeing your DMs with your boss. There's no way to filter what reaches the model.

MCP Spine: One Proxy, Full Control

I built MCP Spine — a local-first middleware proxy that sits between your LLM client and your MCP servers. One config file, one entry point in claude_desktop_config.json, and everything routes through it.

Here's what it does:

61% Token Savings

The schema minifier strips unnecessary fields from tool definitions — $schema, additionalProperties, verbose descriptions, defaults. Level 2 cuts token usage by 61% with zero information loss.

State Guard Stops Context Rot

Spine watches your project files, tracks SHA-256 hashes, and injects version pins into every tool response. When Claude has a stale cached version, the pin tells it to re-read. Context rot solved.

Security That Actually Works

Rate limiting (per-tool and global), path traversal jails, secret scrubbing (AWS keys, GitHub tokens, private keys), HMAC-fingerprinted audit trails, and circuit breakers on failing servers. Defense-in-depth — every layer assumes the others might fail.

Plugin System for Compliance

Drop-in Python plugins hook into the tool call pipeline. The included Slack filter example strips messages from sensitive channels before the LLM ever sees them:

from spine.plugins import SpinePlugin

class SlackFilter(SpinePlugin):
    name = "slack-filter"
    deny_channels = ["hr-private", "exec-salary"]

    def on_tool_response(self, tool_name, arguments, response):
        if "slack" not in tool_name:
            return response
        # Filter out messages from denied channels
        ...

Everything Else

Semantic routing with local embeddings (no API calls) — only relevant tools reach the LLM
Human-in-the-loop confirmation for destructive tools
Token budget tracking with daily limits and warn/block enforcement
Config hot-reload — edit your config while Spine is running
Multi-user audit with session-tagged entries
Three transports: stdio, SSE, and Streamable HTTP (MCP 2025-03-26)
Interactive setup wizard (mcp-spine init)

Quick Start

pip install mcp-spine
mcp-spine init
mcp-spine doctor --config spine.toml

Then add one entry to your claude_desktop_config.json:

{
  "mcpServers": {
    "spine": {
      "command": "python",
      "args": ["-m", "spine.cli", "serve", "--config", "/path/to/spine.toml"]
    }
  }
}

Battle-Tested on Windows

Most MCP tooling assumes macOS. Spine is battle-tested on Windows with MSIX sandbox paths, npx.cmd resolution, paths with spaces and parentheses, environment variable merging, and unbuffered stdout to prevent pipe hangs. It also runs on macOS and Linux.

190+ tests, CI on Windows + Linux across Python 3.11-3.13.

I routed 60 MCP tools through a single proxy — here's what I learned about token waste and security

Donnyb369 — Thu, 16 Apr 2026 21:05:54 +0000

I've been building MCP servers for Claude Desktop for a few months now. At one point I had five servers running: filesystem, GitHub, SQLite, a knowledge graph, and Brave Search. Sixty tools total, all piped into one LLM.

It worked. But three things kept going wrong.

The token problem

Every time Claude makes a tool call, it sends the full schema of every available tool in the context window. Sixty tools means sixty JSON schema definitions, every single request. I measured it: over 4,800 tokens of schema overhead per request, before Claude even starts thinking about your question.

That's money. At API rates, those wasted tokens add up fast across a workday of tool calls.

The security problem

I found out the hard way that my claude_desktop_config.json was passing environment variables to child processes — and a bug in how I was merging env vars meant the entire system PATH, including tokens and API keys, was getting passed through. One of my GitHub tokens ended up in a log file. Twice.

MCP servers run as child processes with whatever permissions your user account has. There's no audit trail, no rate limiting, no secret scrubbing. If a tool call returns sensitive data, it goes straight into the LLM context with no filtering.

The context rot problem

Claude would read a file, modify it three tool calls later, then reference the stale version from its context. The file had changed on disk but Claude was still working with the old content. I called this "context rot" — the LLM's view of the world drifts from reality over a long session.

So I built a proxy

MCP Spine sits between Claude Desktop and all your MCP servers. One proxy, one connection, all traffic flows through it.

Claude Desktop ◄──stdio──► MCP Spine ◄──stdio──► filesystem
                                      ◄──stdio──► GitHub
                                      ◄──stdio──► SQLite
                                      ◄──stdio──► memory
                                      ◄──stdio──► Brave Search

Here's what it does at each layer:

Security proxy — validates every JSON-RPC message, scrubs secrets from tool outputs (AWS keys, GitHub tokens, bearer tokens, private keys, connection strings), rate limits tool calls, blocks command injection and path traversal, and writes an HMAC-fingerprinted audit trail to SQLite.

Schema minifier — strips verbose descriptions, defaults, and metadata from tool schemas before they reach the LLM. The type information and required fields stay intact. Real measured savings on 12 representative tools:

Level	Savings
0 (off)	0%
1 (light)	11%
2 (default)	32%

The best individual tool (read_file) went from 586 characters down to 242 — a 59% reduction. The savings compound: with 60 tools, Level 2 saves roughly 1,500 tokens per request.

State guard — watches files on disk with SHA-256 hashes. When Claude references a file that's changed since it last read it, Spine injects a version pin into the response: "this file has changed since you last saw it." No more context rot.

Semantic router — uses local embeddings (ChromaDB + MiniLM) to figure out which tools are relevant to the current task. Instead of showing all 60 tools, it shows the 5-10 that matter. This is optional and currently experimental — the ML dependencies add startup time, so I made them lazy-loading.

What I learned building it

Environment variable handling is a minefield. The biggest bug I hit was env=self.config.env or None in the subprocess spawn. When a server config had custom env vars (like GITHUB_TOKEN), this replaced the entire process environment instead of extending it. Every server that needed a custom env var was silently missing PATH, HOME, and everything else. The fix was one line: {**os.environ, **self.config.env}. But it took hours to diagnose because the error messages were about missing executables, not missing env vars.

Windows is a different world. Python's asyncio on Windows uses a Proactor event loop that can't do connect_read_pipe / connect_write_pipe on stdio handles from piped processes. The workaround is raw binary I/O with run_in_executor for reads. I also had to handle paths with spaces and parentheses (my project lives in MCP (The Spine)), UNC paths, and the MSIX sandbox that Claude Desktop runs in.

npx is slow, node is fast. Spawning MCP servers via npx @modelcontextprotocol/server-github takes 10-15 seconds because npx checks for updates every time. Switching to node C:\path\to\node_modules\...\dist\index.js connects in under a second. This matters because MCP clients have handshake timeouts.

Thread safety in audit logging is easy to get wrong. The semantic router runs a background thread for model loading. That thread calls the audit logger, which tries to use a SQLite connection created in the main thread. SQLite doesn't allow cross-thread connection sharing. Fix: check_same_thread=False plus a threading.Lock() around all DB operations.

The numbers

Running on Windows with Python 3.14 and Claude Desktop:

6 MCP servers connected through one proxy
60 tools total, routed and minified
32% average schema token savings (up to 59% on verbose tools)
135+ tests, CI green on Windows + Linux
Sub-second server connections (with node direct path)

Try it

pip install mcp-spine

Configure your servers in a TOML file, point Claude Desktop at Spine, and all your MCP traffic gets security hardening, token savings, and an audit trail.

GitHub: github.com/Donnyb369/mcp-spine
PyPI: pypi.org/project/mcp-spine

It's open source, local-first, and works on Windows and Linux. No cloud, no accounts, no telemetry.

I'm an independent developer building open-source MCP tooling. If you're using MCP servers with Claude Desktop or any other LLM client, I'd love to hear what problems you're hitting. Drop a comment or open an issue on GitHub.