<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Algis</title>
    <description>The latest articles on Forem by Algis (@algis).</description>
    <link>https://forem.com/algis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3323151%2F5ae141c9-0718-4bdc-b418-b015eb91d5b8.jpeg</url>
      <title>Forem: Algis</title>
      <link>https://forem.com/algis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/algis"/>
    <language>en</language>
    <item>
      <title>The OWASP MCP Top 10: A Security Framework for the AI Agent Era</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Wed, 18 Mar 2026 16:54:11 +0000</pubDate>
      <link>https://forem.com/algis/the-owasp-mcp-top-10-a-security-framework-for-the-ai-agent-era-lao</link>
      <guid>https://forem.com/algis/the-owasp-mcp-top-10-a-security-framework-for-the-ai-agent-era-lao</guid>
      <description>&lt;p&gt;The Model Context Protocol needed its own threat taxonomy. Now it has one.&lt;/p&gt;

&lt;p&gt;OWASP -- the organization behind the Web Application Top 10 that shaped a generation of security engineering -- has published the &lt;strong&gt;MCP Top 10&lt;/strong&gt;, a structured framework for the most critical security risks in AI agent tool integration. The project, led by Vandana Verma Sehgal, is currently in beta under a CC BY-NC-SA 4.0 license, and it addresses a gap that has been widening for months: the absence of a shared vocabulary for reasoning about MCP security.&lt;/p&gt;

&lt;p&gt;This is not a theoretical exercise. Over 30 CVEs have been filed against MCP implementations in the past 60 days. Research consistently shows that tool poisoning attacks succeed at alarming rates -- 84.2% with auto-approval enabled, according to recent benchmarks. An audit of 17 popular MCP servers found an average security score of 34 out of 100, with 100% lacking permission declarations. The threat landscape has outpaced the defensive toolkit, and OWASP’s framework is an attempt to bring structure to the response.&lt;/p&gt;

&lt;p&gt;Here is what each category covers, why it matters, and what practitioners should do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ten Categories
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MCP01: Token Mismanagement and Secret Exposure
&lt;/h3&gt;

&lt;p&gt;Credentials that end up where they should not be. Hard-coded API keys in MCP server configurations, long-lived tokens without rotation policies, and secrets persisted in model memory or protocol debug logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Implement short-lived, scoped credentials. Never store secrets in tool descriptions or model context.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP02: Privilege Escalation via Scope Creep
&lt;/h3&gt;

&lt;p&gt;Permissions that were appropriate during setup expand over time. The cumulative effect is an agent that can modify your entire filesystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Enforce least-privilege by default. Implement automated scope expiry.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP03: Tool Poisoning
&lt;/h3&gt;

&lt;p&gt;Tool poisoning exploits the assumption that tool descriptions are trustworthy. A malicious description can embed hidden instructions that manipulate agent behavior. Invariant Labs showed a poisoned &lt;code&gt;add&lt;/code&gt; tool containing hidden &lt;code&gt;&amp;lt;IMPORTANT&amp;gt;&lt;/code&gt; tags that exfiltrated SSH keys. Tool spoofing achieves 100% success rate in first-match resolution mode.&lt;/p&gt;

&lt;p&gt;Three variants: direct poisoning, tool shadowing, and rug pulls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Implement tool pinning. Never auto-approve tool invocations in production. Use schema quarantine.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP04: Supply Chain Attacks
&lt;/h3&gt;

&lt;p&gt;Classic supply chain vectors -- typosquatting, dependency confusion -- but payloads execute inside AI agents with elevated permissions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Pin MCP server versions. Verify package signatures. Monitor registries.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP05: Command Injection
&lt;/h3&gt;

&lt;p&gt;The MCP equivalent of SQL injection. The Clinejection attack demonstrated how a malicious GitHub issue title could trigger code execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Validate and sanitize all input. Use sandboxed execution environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP06: Intent Flow Subversion
&lt;/h3&gt;

&lt;p&gt;Malicious instructions embedded in tool context hijack the agent’s decision-making.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Separate system instructions from retrieved context. Use chain-of-thought logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP07: Insufficient Authentication
&lt;/h3&gt;

&lt;p&gt;38% of 500+ scanned MCP servers lack any form of authentication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Use OAuth 2.1 as specified in MCP. Enforce RBAC at the tool level.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP08: Lack of Audit and Telemetry
&lt;/h3&gt;

&lt;p&gt;Without logging, unauthorized actions go undetected. Most MCP clients provide minimal logging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Log all tool invocations with full parameters and responses. Enable real-time alerting.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP09: Shadow MCP Servers
&lt;/h3&gt;

&lt;p&gt;Unauthorized deployments outside security governance. Shadow servers have the same trust level as approved ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Centralized MCP deployment governance. Discover and inventory all instances.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP10: Context Injection and Over-Sharing
&lt;/h3&gt;

&lt;p&gt;Sensitive information from one task leaks to another through shared context windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do:&lt;/strong&gt; Isolated context windows per user and per task. Enforce context expiration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Numbers Say
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;30+ CVEs in 60 days&lt;/strong&gt; against MCP implementations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;84.2% success rate&lt;/strong&gt; for tool poisoning with auto-approval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;38% of 500+ servers&lt;/strong&gt; lack authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;34/100 average security score&lt;/strong&gt; across 17 audited servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% tool spoofing success&lt;/strong&gt; in first-match resolution&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FastMCP exceeds 1M daily downloads&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Emerging Defense Patterns
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Schema Quarantine and Tool Pinning&lt;/strong&gt; -- Verify tool definitions before they reach the agent. Invariant Labs' &lt;code&gt;mcp-scan&lt;/code&gt; detects poisoning, rug pulls, and cross-origin escalations. &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go" rel="noopener noreferrer"&gt;MCPProxy&lt;/a&gt; combines BM25-based tool discovery with quarantine capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime Behavioral Monitoring&lt;/strong&gt; -- Detect behavioral drift with tools like Golf Scanner and AgentArmor's 8-layer security framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Registry Governance&lt;/strong&gt; -- Signed packages, provenance tracking, automated vulnerability scanning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Isolation&lt;/strong&gt; -- Isolated context windows per task, strict permission boundaries per tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Response Plan
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;This week:&lt;/strong&gt; Inventory MCP connections. Disable auto-approval. Scan configs for secrets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This month:&lt;/strong&gt; Implement tool pinning. Add auth to all connections. Enable audit logging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This quarter:&lt;/strong&gt; Adopt a gateway architecture. Implement context isolation. Establish MCP governance.&lt;/p&gt;

&lt;p&gt;The full framework is available at &lt;a href="https://owasp.org/www-project-mcp-top-10/" rel="noopener noreferrer"&gt;owasp.org/www-project-mcp-top-10&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://mcpblog.dev/blog/2026-03-15-owasp-mcp-top-10" rel="noopener noreferrer"&gt;mcpblog.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>security</category>
    </item>
    <item>
      <title>Deploy Your Own Agent Messaging Hub in 15 Minutes -- For Free</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Mon, 16 Mar 2026 09:08:40 +0000</pubDate>
      <link>https://forem.com/algis/deploy-your-own-agent-messaging-hub-in-15-minutes-for-free-14h5</link>
      <guid>https://forem.com/algis/deploy-your-own-agent-messaging-hub-in-15-minutes-for-free-14h5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://synapbus.dev/blog/deploy-agent-hub" rel="noopener noreferrer"&gt;synapbus.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI agent swarms are getting real. Not the theoretical "someday we'll have autonomous agents" kind of real -- the "I have four agents running on a CronJob and they need to talk to each other" kind of real.&lt;/p&gt;

&lt;p&gt;But here's the problem: every messaging backbone people reach for costs money. Redis needs a server. Kafka needs a cluster. Cloud pub/sub services charge per message. For a personal or small-team agent swarm, this overhead kills the project before it starts.&lt;/p&gt;

&lt;p&gt;SynapBus is a different approach: a single Go binary with zero external dependencies. No Redis. No Kafka. No cloud subscription. Embedded SQLite for storage, an HNSW vector index for semantic search, and a Slack-like Web UI for monitoring your agents -- all in one ~20MB binary.&lt;/p&gt;

&lt;p&gt;This post walks through deploying SynapBus, exposing it to the internet for free via Cloudflare Tunnel, and connecting your first AI agents. Total infrastructure cost: $0.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SynapBus Actually Does
&lt;/h2&gt;

&lt;p&gt;SynapBus is a local-first, MCP-native agent-to-agent messaging hub:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Channels and DMs&lt;/strong&gt; -- Slack-like communication between agents and humans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP endpoint&lt;/strong&gt; -- Any MCP-compatible client works out of the box&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search&lt;/strong&gt; -- Every message indexed by meaning, not just keywords&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task auction&lt;/strong&gt; -- Post a task, let agents bid on capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web UI&lt;/strong&gt; -- Watch agents talk in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MCP interface exposes four tools: &lt;code&gt;my_status&lt;/code&gt;, &lt;code&gt;send_message&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;, and &lt;code&gt;execute&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option A: Docker Compose (5 Minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;synapbus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/synapbus/synapbus:0.4.0&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;synapbus-data:/data&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;SYNAPBUS_LOG_LEVEL=info&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;SYNAPBUS_BASE_URL=http://localhost:8080&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;synapbus-data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then: &lt;code&gt;docker compose up -d&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Option B: Kubernetes with Helm (15 Minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;replicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/synapbus/synapbus&lt;/span&gt;
  &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.4.0"&lt;/span&gt;
&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NodePort&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;span class="na"&gt;persistence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy: &lt;code&gt;helm upgrade --install synapbus synapbus/synapbus --namespace synapbus --create-namespace -f values.yaml&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Expose to the Internet with Cloudflare Tunnel (Free)
&lt;/h2&gt;

&lt;p&gt;Add cloudflared as a sidecar. Cloudflare handles TLS. Your agents connect to &lt;code&gt;https://hub.example.com/mcp&lt;/code&gt; with full HTTPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: Create Agents and Channels
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec &lt;/span&gt;synapbus /synapbus user create &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="nt"&gt;--password&lt;/span&gt; MySecurePass123
docker &lt;span class="nb"&gt;exec &lt;/span&gt;synapbus /synapbus agent create &lt;span class="nt"&gt;--name&lt;/span&gt; research-agent &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"Research Agent"&lt;/span&gt; &lt;span class="nt"&gt;--owner&lt;/span&gt; 1
docker &lt;span class="nb"&gt;exec &lt;/span&gt;synapbus /synapbus channels create &lt;span class="nt"&gt;--name&lt;/span&gt; news &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"Top discoveries"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connect Agents via MCP
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;claude_agent_sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ClaudeAgentOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;

&lt;span class="n"&gt;mcp_servers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synapbus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hub.example.com/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SYNAPBUS_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SynapBus&lt;/td&gt;
&lt;td&gt;$0 -- open source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Tunnel&lt;/td&gt;
&lt;td&gt;$0 -- free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker / K8s&lt;/td&gt;
&lt;td&gt;$0 -- your hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI embeddings&lt;/td&gt;
&lt;td&gt;~$0.02/1M tokens (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;p&gt;After 15 minutes: Slack-like Web UI, MCP-native connectivity, semantic search, channels and DMs, task auction, HTTPS via Cloudflare, persistent storage, Prometheus metrics.&lt;/p&gt;

&lt;p&gt;SynapBus is not a framework. It is infrastructure: a messaging hub that agents connect to via MCP.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;SynapBus is open source at &lt;a href="https://github.com/synapbus/synapbus" rel="noopener noreferrer"&gt;github.com/synapbus/synapbus&lt;/a&gt;. Originally published at &lt;a href="https://synapbus.dev/blog/deploy-agent-hub" rel="noopener noreferrer"&gt;synapbus.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Beyond BM25: The Future of MCP Tool Discovery</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Sun, 15 Mar 2026 18:01:20 +0000</pubDate>
      <link>https://forem.com/algis/beyond-bm25-the-future-of-mcp-tool-discovery-57d7</link>
      <guid>https://forem.com/algis/beyond-bm25-the-future-of-mcp-tool-discovery-57d7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://mcpproxy.app/blog/2026-03-15-beyond-bm25-tool-discovery" rel="noopener noreferrer"&gt;mcpproxy.app/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;In our earlier post, we made the case for BM25 as the right default for MCP tool discovery -- and for small-to-medium tool sets, that case still holds. But new benchmarks from StackOne, Stacklok, and the RAG-MCP paper paint a more nuanced picture: BM25 alone delivers just 14% top-1 accuracy when tool counts climb past a few hundred. Hybrid approaches combining BM25 with semantic search hit 94%. This post lays out what the data actually shows, why BM25 degrades at scale, and how MCPProxy is evolving toward hybrid search while keeping the zero-dependency simplicity that makes it useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmarks Are In
&lt;/h2&gt;

&lt;p&gt;Three independent evaluations have landed in the last few months, and they tell a consistent story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;StackOne's benchmark&lt;/strong&gt; tested 270 tools across 11 API categories with 2,700 natural-language queries:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Top-1 Accuracy&lt;/th&gt;
&lt;th&gt;Top-5 Accuracy&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BM25 only&lt;/td&gt;
&lt;td&gt;14%&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TF-IDF/BM25 hybrid&lt;/td&gt;
&lt;td&gt;21%&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding search&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;50-200ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reranker&lt;/td&gt;
&lt;td&gt;40%+&lt;/td&gt;
&lt;td&gt;90%+&lt;/td&gt;
&lt;td&gt;200-500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Stacklok's MCP Optimizer&lt;/strong&gt; ran a head-to-head comparison against Anthropic's built-in Tool Search across 2,792 tools. Their hybrid semantic+BM25 approach achieved &lt;strong&gt;94% selection accuracy&lt;/strong&gt; versus &lt;strong&gt;34% for BM25-only&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The RAG-MCP paper&lt;/strong&gt; confirmed that agents given every tool upfront achieve just 13.6% accuracy, while retrieval-first routing more than triples it to 43.1%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why BM25 Breaks Down at Scale
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Common verbs saturate the index.&lt;/strong&gt; When you have 2,000+ tools, verbs like "create," "list," "get" appear in hundreds of tool names. BM25's IDF component loses discriminating power.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short documents amplify the problem.&lt;/strong&gt; Tool descriptions are uniformly short (10-50 words), collapsing a dimension BM25 normally uses for discrimination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic intent gets lost.&lt;/strong&gt; "notify the team about a deployment" might need Slack, PagerDuty, or email. BM25 cannot bridge the gap between "notify" and "send_message."&lt;/p&gt;

&lt;p&gt;None of this invalidates BM25 for smaller deployments. The 87% top-5 accuracy confirms BM25 almost always gets the right tool &lt;em&gt;somewhere&lt;/em&gt; in the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Hybrid Search Actually Looks Like
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Parallel Retrieval
&lt;/h3&gt;

&lt;p&gt;The query runs simultaneously through two paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BM25 path&lt;/strong&gt;: Keyword search against the Bleve index. Sub-millisecond, zero dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic path&lt;/strong&gt;: Query embedded via lightweight model, compared against pre-computed tool embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Reciprocal Rank Fusion
&lt;/h3&gt;

&lt;p&gt;The two ranked lists merge using RRF:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RRF_score(tool) = 1/(k + rank_bm25) + 1/(k + rank_semantic)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RRF is score-agnostic -- it works on rank positions, not raw scores. This sidesteps the normalization problem entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Works So Well
&lt;/h3&gt;

&lt;p&gt;BM25 excels at exact term matching. Embeddings excel at semantic bridging. RRF ensures high confidence when both signals agree. Stacklok's 94% vs BM25's 34% on 2,792 tools proves the combination is categorically better at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where BM25 Still Wins
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Small-to-medium tool sets (under 100 tools).&lt;/strong&gt; 87% top-5 accuracy, zero dependencies, sub-millisecond.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Air-gapped environments.&lt;/strong&gt; No network calls required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Determinism and debuggability.&lt;/strong&gt; BM25 scoring is fully transparent and inspectable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold start speed.&lt;/strong&gt; Indexes built instantly from tool metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCPProxy's Roadmap: Hybrid Without Compromise
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Smarter BM25 (Now)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Field-weighted scoring (tool names &amp;gt; descriptions)&lt;/li&gt;
&lt;li&gt;Verb deweighting for common actions&lt;/li&gt;
&lt;li&gt;Query expansion for abbreviations&lt;/li&gt;
&lt;li&gt;Server-context boosting&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: Optional Embedding Layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Local embedding models (~80MB, single-digit ms)&lt;/li&gt;
&lt;li&gt;Pre-computed embeddings stored alongside Bleve index&lt;/li&gt;
&lt;li&gt;RRF fusion&lt;/li&gt;
&lt;li&gt;Graceful degradation to BM25-only&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 3: Hierarchical Discovery
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Server-level grouping as first-level filter&lt;/li&gt;
&lt;li&gt;Progressive disclosure (mirrors Claude Code's pattern)&lt;/li&gt;
&lt;li&gt;Dynamic tool sets by annotation or usage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Guiding Principle
&lt;/h3&gt;

&lt;p&gt;Every phase maintains MCPProxy's core contract: &lt;strong&gt;it ships as a single binary with zero required external dependencies.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for You
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your Scale&lt;/th&gt;
&lt;th&gt;Recommended Approach&lt;/th&gt;
&lt;th&gt;Expected Top-1 Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10-50 tools&lt;/td&gt;
&lt;td&gt;BM25 (MCPProxy default)&lt;/td&gt;
&lt;td&gt;~80-85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50-200 tools&lt;/td&gt;
&lt;td&gt;BM25 with field weighting&lt;/td&gt;
&lt;td&gt;~60-70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200-500 tools&lt;/td&gt;
&lt;td&gt;Hybrid BM25 + embedding&lt;/td&gt;
&lt;td&gt;~85-90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500+ tools&lt;/td&gt;
&lt;td&gt;Hybrid + hierarchical discovery&lt;/td&gt;
&lt;td&gt;~90-94%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The earlier BM25 post was not wrong -- it was incomplete. BM25 is the right starting point. But the data is clear that BM25 alone does not scale to the hundreds-of-tools future. MCPProxy is evolving toward hybrid search because the constraints are changing -- and we would rather share that data honestly than pretend a single algorithm solves everything forever.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;MCPProxy is open source at &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go" rel="noopener noreferrer"&gt;github.com/smart-mcp-proxy/mcpproxy-go&lt;/a&gt;. Originally published at &lt;a href="https://mcpproxy.app/blog/2026-03-15-beyond-bm25-tool-discovery" rel="noopener noreferrer"&gt;mcpproxy.app/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>search</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The MCP Gateway Landscape in 2026: Where MCPProxy Fits</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Sun, 15 Mar 2026 17:57:31 +0000</pubDate>
      <link>https://forem.com/algis/the-mcp-gateway-landscape-in-2026-where-mcpproxy-fits-mjk</link>
      <guid>https://forem.com/algis/the-mcp-gateway-landscape-in-2026-where-mcpproxy-fits-mjk</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://mcpproxy.app/blog/2026-03-15-mcp-gateway-landscape" rel="noopener noreferrer"&gt;mcpproxy.app/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Cambrian Explosion of MCP Gateways
&lt;/h2&gt;

&lt;p&gt;Eighteen months ago, "MCP gateway" was barely a category. Today, the &lt;a href="https://github.com/e2b-dev/awesome-mcp-gateways" rel="noopener noreferrer"&gt;awesome-mcp-gateways&lt;/a&gt; list on GitHub tracks &lt;strong&gt;42 projects&lt;/strong&gt; -- 19 open-source and 23 commercial -- and the number keeps climbing. Microsoft, IBM, Docker, Kong, Traefik, and AWS have all shipped MCP gateway solutions. At least eight new open-source gateways appeared in the last six weeks alone.&lt;/p&gt;

&lt;p&gt;What happened? The Model Context Protocol, introduced by Anthropic in late 2024, crossed a critical adoption threshold when OpenAI, Google, and Microsoft all added MCP support. Suddenly every AI agent could talk to any tool using a standard protocol -- and every organization needed something sitting between those agents and tools to enforce auth, control access, scan for threats, and log what happened.&lt;/p&gt;

&lt;p&gt;That "something" is an MCP gateway. But the term now covers everything from a Kubernetes-native reverse proxy to a desktop-first developer tool to a commercial SaaS with 500+ managed integrations. Understanding the landscape requires separating these architectures, identifying which problems each solves, and recognizing which capabilities actually matter for your use case.&lt;/p&gt;

&lt;p&gt;This post maps the territory, compares the major players, and explains where MCPProxy fits in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Architectures, Three Philosophies
&lt;/h2&gt;

&lt;p&gt;The 42 MCP gateways on the market fall into three broad architectural categories. Choosing between them is the first decision that matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud-Native Gateways
&lt;/h3&gt;

&lt;p&gt;These run in Kubernetes, scale horizontally, and assume your MCP servers are deployed as pods or remote services. They excel at multi-tenant environments where platform teams need to govern tool access across dozens of agent deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Microsoft MCP Gateway&lt;/strong&gt; is the canonical example: a C# reverse proxy with StatefulSet-based session affinity, Azure Entra ID authentication, RBAC, and a Tool Gateway Router that dynamically routes tool calls to registered servers. It is Kubernetes-native to its core -- there is no standalone binary, no desktop mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IBM ContextForge&lt;/strong&gt; (3.4K GitHub stars) takes the broadest approach. It federates MCP, A2A, REST, and gRPC APIs behind a single endpoint with 40+ plugins, OpenTelemetry tracing, Redis-backed caching, and multi-cluster federation via mDNS auto-discovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kong AI Gateway&lt;/strong&gt; extends Kong's established API gateway with MCP proxy plugins, OAuth 2.1, and an MCP Registry for tool governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Desktop-First Gateways
&lt;/h3&gt;

&lt;p&gt;These run locally, optimize for individual developers or small teams, and focus on the workflow between your editor (VS Code, Cursor, Claude Code) and your MCP servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker MCP Gateway&lt;/strong&gt; (1.3K stars) is a Docker CLI plugin that runs MCP servers as isolated containers, manages secrets through Docker Desktop, and provides dynamic tool discovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCPProxy&lt;/strong&gt; occupies this space too, but with a different emphasis -- more on that below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed Platforms
&lt;/h3&gt;

&lt;p&gt;Services like &lt;strong&gt;Composio&lt;/strong&gt; (500+ managed integrations), &lt;strong&gt;MintMCP&lt;/strong&gt; (SOC 2/HIPAA audit logs), and &lt;strong&gt;Unified Context Layer&lt;/strong&gt; (1,000+ tools) provide hosted MCP endpoints with pre-built connectors, managed auth, and pay-per-use pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Feature Map
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;MCPProxy&lt;/th&gt;
&lt;th&gt;IBM ContextForge&lt;/th&gt;
&lt;th&gt;Microsoft MCP GW&lt;/th&gt;
&lt;th&gt;Docker MCP GW&lt;/th&gt;
&lt;th&gt;Kong AI GW&lt;/th&gt;
&lt;th&gt;Bifrost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BM25 ranking&lt;/td&gt;
&lt;td&gt;Registry + mDNS&lt;/td&gt;
&lt;td&gt;Dynamic routing&lt;/td&gt;
&lt;td&gt;Auto-discovery&lt;/td&gt;
&lt;td&gt;MCP Registry&lt;/td&gt;
&lt;td&gt;OpenAI-compat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OAuth config&lt;/td&gt;
&lt;td&gt;OAuth, API keys&lt;/td&gt;
&lt;td&gt;Azure Entra ID&lt;/td&gt;
&lt;td&gt;OAuth + secrets&lt;/td&gt;
&lt;td&gt;OAuth 2.1, ABAC&lt;/td&gt;
&lt;td&gt;SSO, Vault&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quarantine + SDD&lt;/td&gt;
&lt;td&gt;Guardrails plugins&lt;/td&gt;
&lt;td&gt;RBAC policies&lt;/td&gt;
&lt;td&gt;Interceptors&lt;/td&gt;
&lt;td&gt;ACLs, guardrails&lt;/td&gt;
&lt;td&gt;Guardrails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker containers&lt;/td&gt;
&lt;td&gt;K8s namespaces&lt;/td&gt;
&lt;td&gt;K8s pods&lt;/td&gt;
&lt;td&gt;Docker containers&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protocol&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MCP (stdio + SSE)&lt;/td&gt;
&lt;td&gt;MCP, A2A, REST, gRPC&lt;/td&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;MCP + REST&lt;/td&gt;
&lt;td&gt;MCP + LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web UI, logs&lt;/td&gt;
&lt;td&gt;OpenTelemetry&lt;/td&gt;
&lt;td&gt;Azure Monitor&lt;/td&gt;
&lt;td&gt;Logging, tracing&lt;/td&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Audit logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single binary&lt;/td&gt;
&lt;td&gt;Docker/K8s/PyPI&lt;/td&gt;
&lt;td&gt;K8s only&lt;/td&gt;
&lt;td&gt;Docker CLI plugin&lt;/td&gt;
&lt;td&gt;K8s + Konnect&lt;/td&gt;
&lt;td&gt;Docker, NPX&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where MCPProxy Is Different
&lt;/h2&gt;

&lt;p&gt;MCPProxy does two things that no other gateway in this landscape does: &lt;strong&gt;BM25 tool discovery&lt;/strong&gt; and &lt;strong&gt;schema quarantine&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tool Discovery Problem
&lt;/h3&gt;

&lt;p&gt;When an agent connects to 15 MCP servers exposing 200+ tools, the LLM's context window fills with tool definitions. Most gateways treat this as a configuration problem -- you manually curate which tools each agent can see. MCPProxy treats it as a &lt;strong&gt;search problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;MCPProxy's BM25 engine ranks available tools by relevance to the agent's current task. The agent sees 3-5 highly relevant tools instead of 200 noisy ones. No other MCP gateway offers automated relevance-based tool filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Quarantine Problem
&lt;/h3&gt;

&lt;p&gt;When you connect a new MCP server, how do you know its tool definitions are safe? Tool poisoning -- hiding malicious instructions in tool descriptions -- is the number one MCP attack vector. MCPProxy's quarantine system holds new tool schemas in a staging area where they are analyzed for known attack patterns before being released to the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Competitors Excel
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Observability: IBM ContextForge.&lt;/strong&gt; Full OpenTelemetry integration with Phoenix, Jaeger, and Zipkin backends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance: Bifrost.&lt;/strong&gt; Eleven microseconds of overhead at 5,000 RPS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Protocol: IBM ContextForge.&lt;/strong&gt; MCP, A2A, REST, and gRPC behind one gateway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise Integration: Kong AI Gateway.&lt;/strong&gt; Existing customer base and compliance certifications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Ease of Use: Composio.&lt;/strong&gt; 500 pre-built integrations with managed auth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Market Trajectory
&lt;/h2&gt;

&lt;p&gt;Three patterns are shaping where the MCP gateway market goes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consolidation is coming.&lt;/strong&gt; 42 gateways is not sustainable. The market will consolidate around 5-8 major players within 12-18 months.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Platform vendors will absorb the category.&lt;/strong&gt; AWS has already added MCP proxy support to API Gateway. Azure has MCP support in API Management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security becomes the differentiator.&lt;/strong&gt; As basic gateway functionality commoditizes, the security layer becomes the primary differentiator.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where MCPProxy Goes from Here
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Near-term (Q2 2026):&lt;/strong&gt; OpenTelemetry export, expanded quarantine rules covering the full OWASP MCP Top 10, improved BM25 ranking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium-term (H2 2026):&lt;/strong&gt; OS-level sandboxing via Linux Landlock, expanded sensitive data detection, public benchmark suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ongoing:&lt;/strong&gt; Staying lean. MCPProxy will remain a single binary that you can download and run in 30 seconds.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;MCPProxy is open source at &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go" rel="noopener noreferrer"&gt;github.com/smart-mcp-proxy/mcpproxy-go&lt;/a&gt;. Star the repo, file issues, or try it with &lt;code&gt;mcpproxy serve&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://mcpproxy.app/blog/2026-03-15-mcp-gateway-landscape" rel="noopener noreferrer"&gt;mcpproxy.app/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>MCP Proxy Pattern: Secure, Retrieval-First Tool Routing for Agents</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Wed, 20 Aug 2025 14:30:29 +0000</pubDate>
      <link>https://forem.com/algis/mcp-proxy-pattern-secure-retrieval-first-tool-routing-for-agents-247c</link>
      <guid>https://forem.com/algis/mcp-proxy-pattern-secure-retrieval-first-tool-routing-for-agents-247c</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;This post proposes an MCP proxy/middleware layer to improve the user experience with AI agents—especially long‑running ones. It explains how the layer retrieves and routes tools on demand, reduces prompt bloat, and adds safety and observability. The post also explains design choices of implemented features and outlines future areas of development in the open‑source &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go/" rel="noopener noreferrer"&gt;MCPProxy&lt;/a&gt; project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: The Model Context Protocol (MCP)
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/introduction" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt; is a new open standard for connecting AI assistants to external tools and data sources. Rather than each AI app needing custom integrations for every service, MCP defines a consistent way (via MCP servers and MCP clients) to add new capabilities to any AI agent. This opens the door to a richer, more connected AI experience. See also Anthropic’s announcement: &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Introducing the Model Context Protocol&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recent MCP advancements.
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/architecture" rel="noopener noreferrer"&gt;MCP specification (architecture)&lt;/a&gt; is evolving rapidly, adding features that make AI-tool interactions more powerful and secure. Some highlights of the latest MCP spec (mid-2025) include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elicitation (Human-in-the-Loop): Tools can pause and ask the user for additional input mid-execution. This turns one-shot calls into interactive multi-turn workflows, enabling things like form filling and clarification questions. Instead of failing on missing info, an MCP server can issue an &lt;code&gt;elicitation/create&lt;/code&gt; request to prompt the user for exactly what’s needed.&lt;/li&gt;
&lt;li&gt;OAuth 2.0 Support: Secure integration with user-authorized APIs is now standardized. Tools can declare OAuth requirements (auth URL, scopes, etc.), and clients handle the login flow automatically. This means an AI agent can safely connect to services like Google or Slack on your behalf, with proper consent.&lt;/li&gt;
&lt;li&gt;Structured Outputs &amp;amp; UI Components: Beyond plain text, MCP now supports structured content schemas and rich media. Tool responses can include typed JSON results or even MIME-typed data (images, audio, etc.), allowing clients like Claude Desktop to render dynamic UI components in-line (&lt;a href="https://youtu.be/TODH2-Inqac?si=PqMy86mghd7n9FCG" rel="noopener noreferrer"&gt;MCP UI demo&lt;/a&gt;). For example, an MCP weather tool could return a JSON object plus an image chart – the chat client can then display a nice formatted forecast card rather than a blob of text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These advances point towards a future where AI agents seamlessly pull in context, ask users for input when needed, and present results in compelling ways. For community talks and demos, see the &lt;a href="https://www.youtube.com/@MCPDevSummit" rel="noopener noreferrer"&gt;MCP Developers Summit&lt;/a&gt;. However, simply enabling an AI to use dozens of tools raises practical challenges. To truly harness MCP’s potential, we need to consider how tools are connected and managed in real-world scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Directly Connecting Tools to an AI Agent: Real-World Limitations
&lt;/h2&gt;

&lt;p&gt;Naively, one could wire up an AI agent (like Claude or ChatGPT) with every tool under the sun. In theory the model would then always have the right function available. In practice, though, loading a large number of MCP tools directly into an LLM session is problematic. The limitations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client &amp;amp; API Limits: Many AI clients have a hard cap on how many tools or functions can be loaded. For example, Cursor IDE supports at most ~40 tools per workspace (&lt;a href="https://forum.cursor.com/t/mcp-proxy-lets-cursor-see-and-use-thousands-of-tools/103694" rel="noopener noreferrer"&gt;discussion&lt;/a&gt;), and OpenAI’s function-calling API allows ~128 functions (&lt;a href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits" rel="noopener noreferrer"&gt;Azure quotas&lt;/a&gt;, &lt;a href="https://community.openai.com/t/maximum-amount-of-tools-for-the-bot-to-use/665720" rel="noopener noreferrer"&gt;community confirmation&lt;/a&gt;, &lt;a href="https://platform.openai.com/docs/assistants/deep-dive" rel="noopener noreferrer"&gt;platform docs&lt;/a&gt;). Cramming hundreds of tools beyond these limits just isn’t possible.&lt;/li&gt;
&lt;li&gt;Huge Prompt Overhead: Each tool’s description and JSON schema consume tokens. Feeding dozens at once bloats the prompt. The RAG-MCP framework shows that retrieving only the relevant tool schemas before invoking the model cuts prompt tokens by more than 50% on MCP stress tests (&lt;a href="https://arxiv.org/abs/2505.03275" rel="noopener noreferrer"&gt;RAG-MCP&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Lower Accuracy with Too Many Options: With a large menu of tools, models mis‑select more often. RAG-MCP reports that naive “all tools loaded” baselines achieved only 13.62% tool selection accuracy, while retrieval-first narrowing more than tripled accuracy to 43.13% on benchmark tasks (&lt;a href="https://arxiv.org/abs/2505.03275" rel="noopener noreferrer"&gt;RAG-MCP&lt;/a&gt;). In other words, more is less – too many options can confuse the model and lead to mistakes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64bdhxpdjjofyp05rttq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64bdhxpdjjofyp05rttq.png" alt="Diagram showing how too many tools increase prompt size and reduce accuracy" width="800" height="440"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
    An illustration of how directly integrating too many tools can hit system limits and degrade performance. Loading every tool’s schema can exceed client-imposed caps (like Cursor’s 40-tool limit) and dramatically inflate the prompt size, leading to slower and less accurate responses. In this example, adding dozens of tools caused higher token usage for the same query, with significantly lower task success.&lt;br&gt;
  
  &lt;/p&gt;

&lt;p&gt;Clearly, a more scalable approach is needed – one that gives the agent access to many tools without overwhelming it at each step. This is where a smart MCP middleware or proxy layer comes in (&lt;a href="https://www.youtube.com/watch?v=sW9UD0e7N5A" rel="noopener noreferrer"&gt;What MCP Middleware Could Look Like&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  How MCPProxy Solves the Tool Overload Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mcpproxy.app/" rel="noopener noreferrer"&gt;MCPProxy&lt;/a&gt; is an open-source project (written in Go) that serves as an intelligent middleware between the AI agent and numerous MCP servers (&lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go" rel="noopener noreferrer"&gt;source code&lt;/a&gt;). Rather than the agent seeing hundreds of tools directly, the agent sees just one proxy endpoint (the MCPProxy), which dynamically routes and filters tool requests behind the scenes. In effect, MCPProxy acts as an aggregation layer or hub for tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It maintains connections to any number of upstream MCP servers (local or &lt;a href="https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp/" rel="noopener noreferrer"&gt;remote&lt;/a&gt;), but exposes them to the agent through a single unified interface.&lt;/li&gt;
&lt;li&gt;It provides a special retrieve_tools function that the agent can call with a query to discover relevant tools on the fly. The proxy uses an internal BM25 search index to match the query against the descriptions of all available tools and returns only the top K matches. By default, MCPProxy will return at most 5 relevant tools for any given query (a configurable top_k parameter).&lt;/li&gt;
&lt;li&gt;When the agent decides to use one of those tools, it then calls a unified call_tool function with the chosen tool’s name and arguments. MCPProxy forwards that to the correct upstream server, handles the execution, and relays the result back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design means the AI doesn’t need to preload every tool’s schema or decide among hundreds of options. It can query the tool space as needed. The result: far fewer tokens consumed and far better accuracy in tool selection. In fact, by loading only the proxy’s functions (one to search tools, one to invoke), an agent can achieve massive prompt savings – one benchmark showed a ~50% reduction in prompt tokens and a corresponding boost in success rate when using this retrieval approach. Instead of drowning in irrelevant options, the model focuses only on a short list of likely tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwn87lwlhwiwxngsgo5t7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwn87lwlhwiwxngsgo5t7.png" alt="How MCPProxy streamlines tool usage" width="800" height="397"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
    How MCPProxy streamlines tool usage. The AI agent uses the proxy’s retrieve_tools call to get just a handful of relevant tools for the task (instead of loading every tool). It then invokes the chosen tool via the proxy’s call_tool. This indirection enables zero manual curation of tools by the user and yields huge token savings and higher accuracy in practice.&lt;br&gt;
  
  &lt;/p&gt;

&lt;p&gt;From the agent’s perspective, it now only sees two core functions (plus a couple management functions) from MCPProxy rather than dozens or hundreds from various servers. Under the hood, MCPProxy keeps track of all connected MCP servers and their available tools, updating the search index whenever a new server or tool is added. Because the agent only ever deals with a single MCP server (the proxy itself), we also avoid hitting client limits – e.g. Cursor IDE treats MCPProxy as “one server” no matter how many actual tools it federates.&lt;/p&gt;

&lt;p&gt;Beyond search and invocation, MCPProxy also implements a couple of other handy MCP features by itself. For instance, it includes an &lt;code&gt;upstream_servers&lt;/code&gt; management tool that lets the agent (or user) list, add, or remove the proxy’s upstream servers via MCP. All of this is provided through a lightweight desktop app with a minimal UI (it lives in your system tray) and cross-platform binaries.&lt;/p&gt;

&lt;p&gt;In short, MCPProxy turns the chaos of many tools into a single organized pipeline. By federating unlimited MCP servers behind one endpoint, it bypasses hard limits (no more 40-tool cap) and minimizes context size (load just what’s needed). This lays a foundation for AI agents to be far more productive with tools, scaling up without drowning in prompt data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling to Hundreds of MCP Servers and Thousands of Tools
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;An exciting implication of using a proxy is that you’re no longer limited to a small handful of tools.&lt;/strong&gt; If your AI needs more capabilities, you can simply spin up more MCP servers and register them with the proxy. In practice, one MCPProxy instance can easily manage dozens or even hundreds of upstream servers – effectively giving your agent access to thousands of tools or functions aggregated together.&lt;/p&gt;

&lt;p&gt;However, managing such a large toolset introduces new challenges: how do we find the right server for a task, and who decides which servers to include? This is where we consider different levels of agent autonomy in tool management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F40k6ofjuxum2ka0qg5ao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F40k6ofjuxum2ka0qg5ao.png" alt="Concept of an autonomy slider in MCP tool management" width="800" height="352"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
    Concept of an autonomy slider in MCP tool management. On the left, a human manually selects and configures each MCP server the agent will use. In the middle, the agent can help by suggesting or adding servers (with user approval). On the right, the agent fully autonomously discovers and integrates new tools as needed. MCPProxy is built to support these modes: it exposes APIs for programmatic server management, so an AI agent can manage its toolset within bounds you define.&lt;br&gt;
  
  &lt;/p&gt;

&lt;p&gt;On one end of the spectrum, a human operator might manually curate a set of MCP servers for the agent (e.g. adding a GitHub server, a Google Drive server, etc. by hand). On the other end, an advanced agent might autonomously discover and integrate new tools on the fly, without human intervention. Andrej Karpathy refers to this concept as the “autonomy slider” – we can choose how much control to give the AI vs the human in orchestrating the solution (see “&lt;a href="https://arxiv.org/abs/2506.12469" rel="noopener noreferrer"&gt;Levels of Autonomy for AI Agents&lt;/a&gt;”). With MCP, this translates to how tool selection and configuration are handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual mode: Human-driven tool discovery. The user explicitly finds and adds MCP servers they think the AI will need. For example, if working on a data analysis task, the user might install a Postgres database MCP server and a plotting MCP server ahead of time. This ensures the agent has the right tools, but it relies on the human’s knowledge and effort.&lt;/li&gt;
&lt;li&gt;Assisted mode: AI suggests, human approves. Here the AI agent can suggest new tools when it encounters a need. It might say “I don’t have a calendar tool – can I install one?” The user can then approve the addition. MCPProxy already enables this workflow: the agent could perform a search in an MCP registry (more on that below) and then call the upstream_servers.add function to register a new server in the proxy. The user stays in the loop, but the agent does the heavy lifting of finding the tool.&lt;/li&gt;
&lt;li&gt;Autonomous mode: AI-driven tool discovery. In the most advanced scenario, the agent itself detects a gap, searches a public registry for a suitable MCP server, and adds it – all on its own. This would push the autonomy slider to the max, letting the AI acquire new skills as needed in real-time. It’s an exciting idea that researchers are already exploring (e.g. Karpathy’s vision of partially autonomous coding agents), though it raises trust and safety questions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, most users will operate somewhere between manual and assisted modes. You might start your AI with a core set of known-good tools, but also want it to be able to grab new tools for specific tasks. With MCPProxy, you can allow or restrict this behavior via configuration flags (for example, running the proxy in read-only mode to forbid adding servers, or enabling an experimental auto-add feature). The important thing is that the infrastructure doesn’t hard-code a limit on the number of tools – you can grow your agent’s toolkit as big as needed.&lt;/p&gt;

&lt;p&gt;It’s worth noting that the ecosystem of MCP servers is expanding very rapidly. There are already thousands of MCP servers available, covering everything from Slack bots to web scraping to code execution. Community-driven directories like &lt;a href="https://www.pulsemcp.com/servers" rel="noopener noreferrer"&gt;Pulse MCP&lt;/a&gt;, &lt;a href="https://glama.ai/mcp/servers" rel="noopener noreferrer"&gt;Glama MCP server directory&lt;/a&gt;, &lt;a href="https://smithery.ai/" rel="noopener noreferrer"&gt;Smithery&lt;/a&gt;, and &lt;a href="https://github.com/lobehub/lobe-chat" rel="noopener noreferrer"&gt;LobeHub marketplace&lt;/a&gt; (see the &lt;a href="https://lobehub.com/mcp/ronie-uliana-mcp-chain" rel="noopener noreferrer"&gt;LobeHub MCP index&lt;/a&gt;) list thousands of servers and provide usage stats. Anthropic and others are working on an &lt;a href="https://github.com/modelcontextprotocol/registry" rel="noopener noreferrer"&gt;official MCP registry&lt;/a&gt; to standardize how agents discover and install these servers dynamically. In short, the raw material (tools) is out there; the challenge is connecting the right tool at the right time. A middleware like MCPProxy, especially paired with an intelligent registry search, could let agents tap into this vast toolbox on demand without human micromanagement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Challenges in an MCP-Based Tool Ecosystem
&lt;/h2&gt;

&lt;p&gt;While the MCP approach holds great promise, implementing it in the real world comes with several practical challenges. Here we discuss a few and how a proxy/middleware can help address them:&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovering and Installing MCP Servers
&lt;/h3&gt;

&lt;p&gt;Finding the appropriate MCP server for a given need is not always straightforward. There is no single “app store” for MCP (at least not yet) – instead, there are multiple registries, directories, and marketplaces cropping up. For example, community directories like &lt;a href="https://www.pulsemcp.com/servers" rel="noopener noreferrer"&gt;Pulse MCP&lt;/a&gt;, &lt;a href="https://glama.ai/mcp/servers" rel="noopener noreferrer"&gt;Glama directory&lt;/a&gt;, and &lt;a href="https://smithery.ai/" rel="noopener noreferrer"&gt;Smithery&lt;/a&gt; catalogue thousands of servers and let you search by category or keyword. There are also emerging registry services aiming to provide a unified API for discovering servers. There are even MCP servers that search registries themselves, such as the &lt;a href="https://glama.ai/mcp/servers/%40KBB99/mcp-registry-server" rel="noopener noreferrer"&gt;MCP Registry Server&lt;/a&gt; and the &lt;a href="https://mcp.so/server/pulsemcp-server" rel="noopener noreferrer"&gt;Pulse MCP server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, once you find a server, you often have to install or run it yourself. Many community MCP servers are simply open-source projects – you might need to run a Docker container or a local script to actually host the server, especially for things that require credentials or local access (like a filesystem tool). This can be a hurdle for non-technical users, and it fragments the experience.&lt;/p&gt;

&lt;p&gt;How MCPProxy helps: The proxy can act as a bridge between registry listings and actual running tools. In the future, I envision the agent being able to search a registry (via some MCP registry API) and then automatically launch the chosen MCP server through the proxy. In fact, MCPProxy’s design already anticipates this: you can add a server by URL or command at runtime using the proxy’s MCP tools. For example, if the agent finds a “PDF reader” MCP server in a registry, it could call mcpproxy tool with parameters something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"upstream_servers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"operation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"add"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pdf_tool"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/pdf/mcp"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to add that server to its arsenal. (The proxy starts indexing the new server’s tools immediately.) Conversely, if the server needs to run locally, the proxy can be configured with a command to start it. In one scenario, the AI could even instruct the proxy to run a Docker container for an MCP server, given the image name.&lt;/p&gt;

&lt;p&gt;All of this is still experimental, but it’s a key area of development. The goal is to remove the manual friction from tool discovery: ultimately, neither the human nor the AI should have to dig through web listings and configuration files to load a new capability. We’re not quite there yet, but MCPProxy is built to integrate with upcoming MCP registries and package managers so that adding a tool becomes as easy as a function call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safe Execution of Code Tools (Sandboxing)
&lt;/h3&gt;

&lt;p&gt;Many MCP servers are essentially code execution environments – for instance, a Python REPL tool, a shell command tool, or an automation script runner. Giving an AI access to these is powerful but dangerous. You don’t want an LLM running arbitrary code on your machine without safeguards. Even benign tools like a web browser automation could be exploited if malicious instructions slip through (e.g. telling the browser to download malware).&lt;/p&gt;

&lt;p&gt;The recommended approach is to sandbox and isolate tool execution. This is an area where containerization (like Docker) plays a big role. In fact, Docker Inc. has released an “MCP Gateway” specifically to help run MCP servers in isolated containers with proper security controls (&lt;a href="https://docs.docker.com/ai/mcp-gateway/" rel="noopener noreferrer"&gt;docs&lt;/a&gt;, &lt;a href="https://www.docker.com/blog/docker-mcp-gateway-secure-infrastructure-for-agentic-ai/" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, &lt;a href="https://github.com/docker/mcp-gateway" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;). Their gateway acts as a single endpoint that proxies to multiple containerized tools, similar in spirit to MCPProxy. The benefits of containerization are clear: each tool server runs with restricted privileges, limited network access, and resource quotas – greatly limiting the blast radius if a tool is misused (&lt;a href="https://www.infoq.com/news/2025/08/docker-mcp/" rel="noopener noreferrer"&gt;InfoQ overview&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;MCPProxy itself can leverage Docker for sandboxing. For example, you could configure an MCP server entry in the proxy that launches &lt;code&gt;docker run...&lt;/code&gt; to start the tool inside a container. This would combine the discovery and sandboxing steps seamlessly.&lt;/p&gt;

&lt;p&gt;Even without full automation, the proxy makes it easier to enforce isolation. You can run the entire proxy under a less-privileged account or inside a VM, such that any tool it spawns has limited access to your system. And because the proxy centralizes calls to tools, it could in theory perform real-time monitoring or filtering of tool actions (much like an API gateway inspecting API calls). This leads into the next challenge – security.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Security and Trust (Tool Poisoning Attacks)
&lt;/h3&gt;

&lt;p&gt;Connecting to third-party tools introduces a new category of AI security issues. A particularly insidious threat is the Tool Poisoning Attack (TPA) (&lt;a href="https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks" rel="noopener noreferrer"&gt;overview&lt;/a&gt;). This is essentially a form of prompt injection where a malicious MCP server hides harmful instructions in its tool descriptions or outputs. Since the AI model reads those descriptions, a cleverly poisoned description can manipulate the model into doing things it shouldn’t – for example, leaking secrets or executing unintended actions. The scary part is that the user might never see these hidden instructions; they are crafted to be invisible to humans (e.g. buried in JSON or markdown), but the AI “sees” them in its prompt.&lt;/p&gt;

&lt;p&gt;Industry awareness of TPAs is growing. In early 2025, security researchers demonstrated how a fake “add numbers” MCP tool could trick an AI into revealing API keys and SSH credentials from the user’s files. Essentially, the tool’s description included a secret section telling the AI to read certain files and send them as part of using the tool – all while appearing harmless to the user. This prompted urgent guidance to be careful about untrusted MCP servers.&lt;/p&gt;

&lt;p&gt;MCPProxy’s security measures: I recognized this risk and built in a quarantine mechanism from the start. By default, MCPProxy will put any newly added MCP server into a “quarantined” state until you explicitly approve it. That means the agent cannot call tools from that server until a human reviews and enables it. This adds a layer of manual vetting – you might, for instance, inspect the tool descriptions or source code of a community MCP server before trusting it. You can even test with a deliberately &lt;a href="https://github.com/smart-mcp-proxy/malicious-demo-mcp-server" rel="noopener noreferrer"&gt;malicious demo MCP server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In practice, when you add a server in MCPProxy via chat with LLM (using the MCP tool), it’s marked as &lt;code&gt;quarantined: true&lt;/code&gt; in the config initially. Next, you can ask LLM to inspect newly added server tools, MCPProxy have corresponding tool &lt;code&gt;quarantine_security&lt;/code&gt; to do that. You will see the result of inspection in the same chat window. Note, that proxy uses LLM "brain" of your client to inspect the server, so you don't need to equip mcpproxy with openai or anthropic api key.&lt;br&gt;
You can then use the proxy’s tray UI or the config file to enable newly added server once you’re comfortable. You can see it in action in the &lt;a href="https://youtu.be/l4hh6WOuSFM?si=_Pf-NQLx2LJTQwnh&amp;amp;t=135" rel="noopener noreferrer"&gt;demo video&lt;/a&gt;. This simple workflow can prevent a rogue server from ever influencing your agent without your knowledge. It’s essentially an allow-list approach.&lt;/p&gt;

&lt;p&gt;Moving forward, I plan to enhance this with more automation – for example, integrating a security scanner that analyzes new MCP servers for suspicious patterns (similar to tools like MCP-Scan). An advanced proxy could even sanitize or reject outputs that contain anomalous hidden instructions. There is also the concept of TPA-resistant clients (AI side mitigations), but having a filtering layer in the middleware is a good defense in depth.&lt;/p&gt;

&lt;p&gt;Other security features on the roadmap include fine-grained access controls (e.g. per-server or per-tool permission settings) and auditing. MCPProxy already logs all tool usage and can expose recent logs from each server (via the &lt;code&gt;upstream_servers&lt;/code&gt;.&lt;code&gt;tail_log&lt;/code&gt; tool method) for debugging with AI agent. These logs could be extended to flag potential security issues (like a tool outputting an SSH key). The bottom line is that as AI agents start relying on external tools, you must treat those tools as part of the attack surface. A proxy is a natural place to enforce Zero Trust principles – assume all tools are untrusted until verified, limit their capabilities, and monitor their behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Useful Features of an MCP Middleware
&lt;/h2&gt;

&lt;p&gt;Beyond solving the big problems above, a middleware like MCPProxy can provide various quality-of-life features that make AI+Tools systems more robust and user-friendly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output Truncation and Caching: Long tool outputs can be problematic for LLMs (they have finite input length and tend to lose context in very long responses). MCPProxy addresses this with a configurable &lt;code&gt;tool_response_limit&lt;/code&gt; – by default it will truncate any tool output beyond 20,000 characters. This prevents a runaway tool from overwhelming the agent with data. In case if agent needs to see some other parts of full output, &lt;code&gt;read_cache&lt;/code&gt; tool can be used to read paginated data from previous tool calls.&lt;/li&gt;
&lt;li&gt;Shared OAuth Authentication: Many MCP servers require authentication to third-party services (think: GitHub API, Google Drive API, etc.). MCPProxy has built-in support for the full OAuth2 flow – including automatically launching your browser for login and capturing the token – and it stores the credentials so you authenticate once and can reuse that session across all your clients. For example, if you connect both your VS Code AI extension and Claude Desktop to MCPProxy, and then add a GitHub MCP server, you only need to go through the GitHub OAuth login one time. The proxy will manage the access token and apply it whenever the agent calls the GitHub tool, even from different front-end applications. This single sign-on style approach greatly improves usability. Under the hood, MCPProxy implements OAuth standards for native apps: &lt;a href="https://www.rfc-editor.org/rfc/rfc8252" rel="noopener noreferrer"&gt;RFC 8252&lt;/a&gt; (PKCE) and &lt;a href="https://www.rfc-editor.org/rfc/rfc7591" rel="noopener noreferrer"&gt;RFC 7591&lt;/a&gt; (Dynamic Client Registration). It also automatically refreshes tokens and can handle multiple accounts if needed.&lt;/li&gt;
&lt;li&gt;Centralized Logging and Debugging: MCPProxy aggregates logs from all upstream servers and the agent’s tool usage into one place on disk (or console). This makes it much easier to debug what’s happening. The proxy can show you which tool was called, with what arguments, and how long it took, all in a unified log. Moreover, as mentioned, there’s an API for the agent to fetch recent logs itself for self-diagnosis – a clever agent might use tail_log to read error messages from a failing tool and decide an alternative strategy. Such introspection is a unique benefit of having a middleware layer coordinating the interactions.&lt;/li&gt;
&lt;li&gt;Performance optimizations: Because the proxy maintains persistent connections to upstream MCP servers, it can reuse them across multiple calls. This avoids the overhead of reconnecting or re-loading the tool definitions each time. If multiple AI clients (or multiple concurrent conversations) are using the same tools via the proxy, they all benefit from a shared connection and index. The proxy could also implement request batching or parallelism transparently. For instance, if the agent needs to call two tools, the proxy could execute them in parallel and stream results back, reducing latency. These kinds of optimizations would be very hard to do without a middleware orchestrating things.&lt;/li&gt;
&lt;li&gt;Configurability and Extensibility: MCPProxy is just one implementation of an MCP middleware, but it is open-source and designed to be extended. You can run it headless on a server or with a tray icon on your laptop. There’s a simple JSON config for defaults, and command-line flags for things like read-only mode or disabling certain features. Advanced users can fork proxy to add custom logic (for example, one could plug in a vector database for semantic tool retrieval in place of BM25). The point is, the middleware approach gives us a playground to enhance how AI agents use tools, without requiring changes to the LLMs themselves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As of now, MCPProxy covers many of the fundamentals (search, routing, auth, basic security). Upcoming features on my roadmap aim to make it even more production-grade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I believe we are at an inflection point reminiscent of other big shifts in computing history. Just as the early web required the development of web servers, proxies, and standards like HTTP to truly take off, the rise of AI agents is spurring the creation of analogous infrastructure for tool integration. MCP is the emerging standard protocol, and around it an ecosystem of servers, registries, and middleware is rapidly forming. It’s a bit chaotic (like the web in the 1990s), but also exciting – new capabilities are being added every day.&lt;/p&gt;

&lt;p&gt;MCPProxy is my attempt to bring order and practicality to this space. It’s about advancing a paradigm: enabling AI agents to be productive assistants rather than isolated chatbots. By handling tool discovery, selection, and security in a flexible middleware, I aim to make it easier for developers and end-users to leverage many tools safely and efficiently. This approach is analogous to how software architecture evolved in the past – from monolithic systems to more modular, mediated ones.&lt;/p&gt;

&lt;p&gt;In summary, AI agents plus tools are incredibly powerful, but you must manage the complexity. A smart proxy like MCPProxy sits at the center of this, acting as traffic controller, librarian, and security guard for an army of tools. There’s still much work to do – from seamless registry integration to stronger safety guarantees – but the progress so far is promising. By sharing my approach and the reasoning behind it, I hope to encourage a broader conversation (and collaboration) on how to build better AI middleware. After all, empowering AI agents with tools safely and effectively could usher in a new wave of productivity, much like the personal computer revolution or the rise of the internet did in their eras. With the right infrastructure, you can let AI collaborators use all the tools they need, and move one step closer to truly useful, reliable agentic AI.&lt;/p&gt;

&lt;p&gt;Try MCPProxy: &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go/releases" rel="noopener noreferrer"&gt;download the latest release&lt;/a&gt; and share feedback or suggest features via &lt;a href="https://github.com/smart-mcp-proxy/mcpproxy-go/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Originally published at &lt;a href="https://mcpproxy.app/blog/2025-08-10-productivity-tools-for-ai-agents/" rel="noopener noreferrer"&gt;mcpproxy.app/blog/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>llm</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building LLM-Powered Audience Testing with AI Agents</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Wed, 09 Jul 2025 07:08:23 +0000</pubDate>
      <link>https://forem.com/algis/building-llm-powered-audience-testing-with-ai-agents-1126</link>
      <guid>https://forem.com/algis/building-llm-powered-audience-testing-with-ai-agents-1126</guid>
      <description>&lt;p&gt;🗒️ &lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Large Language Models (LLMs) can act as &lt;em&gt;“people spirits”&lt;/em&gt;—stochastic simulations of real users[1]. By pairing them with Model Context Protocol (MCP) browser automation, we can already run realistic A/B tests and spot issues &lt;strong&gt;before&lt;/strong&gt; shipping code.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. The Core Concept: LLMs as People Spirits
&lt;/h2&gt;

&lt;p&gt;Andrej Karpathy calls LLMs &lt;em&gt;“stochastic simulations of people”&lt;/em&gt; powered by an autoregressive Transformer[1]. Because they are trained on human text, they develop an &lt;strong&gt;emergent, human-like psychology&lt;/strong&gt;—perfect for audience testing.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Research Foundation: LLM-as-Judge Accuracy
&lt;/h2&gt;

&lt;p&gt;Studies find LLM evaluations correlate up to &lt;strong&gt;80 % with human judgment&lt;/strong&gt;[2][3], though the best models still trail behind inter-human agreement[4]. Stanford’s generative-agent work even showed &lt;strong&gt;85 % self-agreement&lt;/strong&gt; on survey answers two weeks apart[5].&lt;br&gt;&lt;br&gt;
Bottom line: today’s top models are “good enough” to guide product decisions at scale.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. System Prompts = Instant Personas
&lt;/h2&gt;

&lt;p&gt;A single system prompt can turn one model into many audiences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a 25-year-old gamer from Berlin who values speed and dark themes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combine demographic, psychographic, and cultural cues to create diverse personas. AgentA/B research confirms that &lt;strong&gt;LLM personas can navigate real webpages and mimic user behavior&lt;/strong&gt;[6][7].&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Wiring an AI-Driven A/B Test
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What to Do&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Control vs. Variations&lt;/strong&gt; – draft baseline and experimental prompts&lt;/td&gt;
&lt;td&gt;Sets up classic A/B structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;MCP Browser Automation&lt;/strong&gt; – let agents click, scroll, fill forms[9][10]&lt;/td&gt;
&lt;td&gt;Generates realistic interaction data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Log &amp;amp; Score&lt;/strong&gt; – capture impressions, task success, sentiment&lt;/td&gt;
&lt;td&gt;Quantifies user experience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Analyze&lt;/strong&gt; – compare KPIs across personas&lt;/td&gt;
&lt;td&gt;Reveals which version wins and &lt;em&gt;why&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  5. Business Wins
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Product Development&lt;/strong&gt;: Test features with Gen Z gamers, Millennial execs, or rural seniors—&lt;em&gt;overnight&lt;/em&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketing Copy&lt;/strong&gt;: Iterate headlines until every persona clicks.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UX Audits&lt;/strong&gt;: Detect accessibility or cultural friction long before launch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Tech Stack: Ready Today
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLMs&lt;/strong&gt;: GPT-4-level or better for high alignment[11].
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt;: Standard bridge that lets agents control browsers and other tools[9][12].
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation Servers&lt;/strong&gt;: Browser MCP or Playwright MCP for GUI tasks[10].&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  7. Roll-Out Plan
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Proof of Concept&lt;/strong&gt; – spin up 3–5 key personas and test one flow.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate&lt;/strong&gt; – pipe results into existing A/B dashboards.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale&lt;/strong&gt; – auto-generate new personas, add prompt-tuning loops, build reporting widgets.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advance&lt;/strong&gt; – predict reactions to unreleased features, run global localization checks, model competitor responses.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;LLMs already let us &lt;em&gt;see through our users’ eyes&lt;/em&gt;. Pair them with MCP automation, and you can iterate faster than ever—no waiting for live traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Karpathy A. “Software Is Changing (Again)” – YC AI Startup School&lt;br&gt;&lt;br&gt;
[2] Jung J. &lt;em&gt;Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[3] Jung J. &lt;em&gt;Trust or Escalate&lt;/em&gt; (companion study)&lt;br&gt;&lt;br&gt;
[4] Thakur A. S. &lt;em&gt;Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[5] Park J. S. &lt;em&gt;Simulating Human Behavior with AI Agents&lt;/em&gt; – Stanford HAI&lt;br&gt;&lt;br&gt;
[6] Park J. S. &lt;em&gt;AgentA/B: Automated and Scalable Web A/B Testing with Interactive LLM Agents&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[7] Park J. S. &lt;em&gt;AgentA/B&lt;/em&gt; (v2)&lt;br&gt;&lt;br&gt;
[9] Anthropic. &lt;em&gt;Introducing the Model Context Protocol&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[10] Browser MCP. &lt;em&gt;Automate your browser with AI&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[11] MCP.so. &lt;em&gt;Browser Automation MCP Server&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[12] Microsoft. &lt;em&gt;Model Context Protocol (MCP): Integrating Azure OpenAI&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>ux</category>
    </item>
    <item>
      <title>🔧 From Frustration to Success: How I Fixed a Stubborn Bug with AI Using Debug-Only Mode</title>
      <dc:creator>Algis</dc:creator>
      <pubDate>Fri, 04 Jul 2025 16:47:56 +0000</pubDate>
      <link>https://forem.com/algis/from-frustration-to-success-how-i-fixed-a-stubborn-bug-with-ai-using-debug-only-mode-2mna</link>
      <guid>https://forem.com/algis/from-frustration-to-success-how-i-fixed-a-stubborn-bug-with-ai-using-debug-only-mode-2mna</guid>
      <description>&lt;p&gt;Ever had an AI assistant confidently tell you &lt;strong&gt;“bug fixed!”&lt;/strong&gt; several times… only to discover the bug is still alive and kicking? I sure did.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I was fighting a stubborn UI glitch in the macOS tray menu of my side-project.&lt;br&gt;&lt;br&gt;
Using Cursor, I followed my normal routine: describe the bug in detail, list the steps to reproduce it, and spell out the expected behaviour.&lt;/p&gt;

&lt;h2&gt;
  
  
  My First Approach (That Failed)
&lt;/h2&gt;

&lt;p&gt;I threw every model I had at the issue—Claude-4 Sonnet, Gemini 2.5 Pro, GPT-4.1 and even attached screenshots.&lt;br&gt;&lt;br&gt;
Each model rewrote different parts of the code and proudly announced &lt;em&gt;“Fixed!”&lt;/em&gt;.&lt;br&gt;&lt;br&gt;
But when I ran the app, the bug was still there.&lt;br&gt;&lt;br&gt;
Worse, the AI sometimes got stuck in loops or removed unrelated features in its attempts to “help.” After a few hours, I realised I was spinning my wheels.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Game-Changing Strategy: Debug-Only Mode
&lt;/h2&gt;

&lt;p&gt;Instead of letting the AI rewrite code, I set one hard rule: &lt;strong&gt;the AI can only add debug logs—nothing else&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Here’s how it worked:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Restrict AI permissions&lt;/strong&gt; – “Add log lines only.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get targeted grep commands&lt;/strong&gt; – the AI supplies copy-ready commands after every change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feed real data back&lt;/strong&gt; – run the app, reproduce the bug, paste the filtered logs into chat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat until breakthrough&lt;/strong&gt; – short, focused iterations.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;After 4–5 loops, we traced the culprit to bad cache invalidation in the data layer—nowhere near the UI. One tiny manual patch fixed everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real data beats guesswork&lt;/strong&gt; 📊
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You stay in control&lt;/strong&gt; 🎮
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast feedback loops&lt;/strong&gt; ⚡
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prevents scope creep&lt;/strong&gt; 🎯
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;Have you tried limiting your AI assistant’s permissions?&lt;br&gt;&lt;br&gt;
What debugging tricks work best for you? Let me know in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
  </channel>
</rss>
