<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vladyslav Mokrousov</title>
    <description>The latest articles on Forem by Vladyslav Mokrousov (@vmokrousov).</description>
    <link>https://forem.com/vmokrousov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809840%2F82e54212-7d1d-4007-a688-7bcdf2f0733a.jpg</url>
      <title>Forem: Vladyslav Mokrousov</title>
      <link>https://forem.com/vmokrousov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vmokrousov"/>
    <language>en</language>
    <item>
      <title>How to do Regression Testing for MCP Servers</title>
      <dc:creator>Vladyslav Mokrousov</dc:creator>
      <pubDate>Fri, 06 Mar 2026 12:14:53 +0000</pubDate>
      <link>https://forem.com/vmokrousov/how-to-do-regression-testing-for-mcp-servers-394e</link>
      <guid>https://forem.com/vmokrousov/how-to-do-regression-testing-for-mcp-servers-394e</guid>
      <description>&lt;p&gt;If you maintain an MCP server, there is a class of breakage that no amount of unit testing will catch. Someone on your team renames a tool parameter from &lt;code&gt;query&lt;/code&gt; to &lt;code&gt;search_query&lt;/code&gt;, or rephrases a tool description from "Search the web" to "Search the web for recent results," and the change passes every test in the suite because nothing actually validates the protocol surface your server exposes to AI agents.&lt;/p&gt;

&lt;p&gt;These schema drift issues are not always visible outright, but they accumulate over time and tend to surface as baffling agent failures — tools that stop being selected, arguments that arrive malformed, responses that get misinterpreted — precisely because MCP tool descriptions are not documentation in the traditional sense. They are instructions. The model reads them to decide when to call a tool, how to invoke it, and what to do with the result. A reworded description is not a cosmetic change. It is a change in the instruction set the model operates from.&lt;/p&gt;

&lt;p&gt;The ecosystem has clearly started feeling this. Over the past few weeks, a small wave of projects has appeared — schema drift detectors, description diffing tools, supply chain auditors for tool surfaces — all trying to address some facet of the same underlying problem: MCP servers need the kind of regression testing discipline that REST APIs have enjoyed for the better part of a decade.&lt;/p&gt;

&lt;h2&gt;
  
  
  We solved the analogous problem for HTTP a long time ago
&lt;/h2&gt;

&lt;p&gt;Consider how we test REST APIs today. We have OpenAPI specs that serve as a committed, diffable contract. We have Pact for consumer-driven contract testing. We have VCR.py and its equivalents in every language — record an HTTP exchange, commit the cassette to the repo, replay it in tests so you never depend on a live server. When the contract changes, the diff appears in your pull request, attributed to an author, ready for review.&lt;/p&gt;

&lt;p&gt;MCP has none of this infrastructure yet. There are &lt;a href="https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1575" rel="noopener noreferrer"&gt;active proposals&lt;/a&gt; for tool semantic versioning in the spec, but nothing has shipped. There is no static artifact that describes a server's tool surface — when the server starts, it responds to &lt;code&gt;tools/list&lt;/code&gt; with whatever it happens to have in memory at that moment. Nothing to commit, nothing to pin, nothing to diff against.&lt;/p&gt;

&lt;p&gt;But the pattern that works for REST ought to work here too: record a known-good protocol exchange, commit it to the repository, and verify on every subsequent change that the server still produces the same output.&lt;/p&gt;

&lt;p&gt;The difference lies in what you record. For REST, it is HTTP request/response pairs. For MCP, it is the full JSON-RPC lifecycle: the &lt;code&gt;initialize&lt;/code&gt; handshake (where capabilities are negotiated), the &lt;code&gt;tools/list&lt;/code&gt; response (where every tool schema lives), and the actual &lt;code&gt;tools/call&lt;/code&gt; results (where behavioural regressions hide). Capture that entire exchange into a single artifact, and you have both a regression test and living documentation of your server's public interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Record, commit, verify
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/devhelmhq/mcp-recorder" rel="noopener noreferrer"&gt;mcp-recorder&lt;/a&gt; to implement this pattern. The mental model is VCR.py, applied to the MCP protocol rather than raw HTTP.&lt;/p&gt;

&lt;p&gt;It works as a transparent proxy (for HTTP servers) or subprocess wrapper (for stdio servers) that captures the full MCP exchange into a JSON cassette file. That single recording unlocks two testing directions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record:   Client → mcp-recorder → Real Server → cassette.json
                                   (HTTP or stdio)

Replay:   Client → mcp-recorder (mock) → cassette.json     (test your client)
Verify:   mcp-recorder (client mock) → Real Server          (test your server)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Replay&lt;/strong&gt; serves recorded responses to your client without a real server — no credentials, no network, deterministic every time. &lt;strong&gt;Verify&lt;/strong&gt; sends the recorded requests to your (possibly changed) server and diffs the actual responses against the golden recording. The verify output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Verifying golden.json against node dist/index.js

  1. initialize          [PASS]
  2. tools/list          [PASS]
  3. tools/call [search] [FAIL]
       $.result.content[0].text: "old output" != "new output"
  4. tools/call [analyze] [PASS]

Result: 3/4 passed, 1 failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code is non-zero on any diff, so it plugs directly into CI.&lt;/p&gt;

&lt;p&gt;The distinction from schema-only tools matters: because you are recording actual protocol exchanges rather than just comparing &lt;code&gt;tools/list&lt;/code&gt; snapshots, you capture &lt;em&gt;behavioural&lt;/em&gt; regression as well. If a &lt;code&gt;tools/call&lt;/code&gt; used to return a specific error format and now returns something different, the cassette catches it. If capabilities that were previously advertised during &lt;code&gt;initialize&lt;/code&gt; quietly disappear, the cassette catches that too. Schema diffing alone would miss both.&lt;/p&gt;

&lt;p&gt;Both transports are first-class citizens. Most MCP servers in local development communicate over stdio — you spawn a subprocess and exchange JSON-RPC over stdin/stdout. Remote and cloud-hosted servers use HTTP (Streamable HTTP or SSE). The cassette format is identical regardless of transport:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# stdio — the typical case for locally developed MCP servers&lt;/span&gt;
mcp-recorder verify &lt;span class="nt"&gt;--cassette&lt;/span&gt; golden.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target-stdio&lt;/span&gt; &lt;span class="s2"&gt;"node dist/index.js"&lt;/span&gt;

&lt;span class="c"&gt;# HTTP — remote or hosted servers&lt;/span&gt;
mcp-recorder verify &lt;span class="nt"&gt;--cassette&lt;/span&gt; golden.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; https://your-mcp-server.example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Applying this to a real server
&lt;/h2&gt;

&lt;p&gt;To make this concrete rather than hypothetical, let's look at what it takes to add regression testing to an existing, production MCP server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/mondaycom/mcp" rel="noopener noreferrer"&gt;monday.com's MCP server&lt;/a&gt; is a good candidate. It is a TypeScript server exposing 20+ tools to AI agents — boards, items, updates, documents, workflows — and its only CI workflow at the time of writing is an npm publish step. There is no test that would catch a renamed tool, a removed parameter, or a changed description.&lt;/p&gt;

&lt;p&gt;I submitted &lt;a href="https://github.com/mondaycom/mcp/pull/222" rel="noopener noreferrer"&gt;PR #222&lt;/a&gt; to add schema regression testing. The entire integration consists of a scenarios file, a golden cassette directory, and one CI step. Here's the scenarios file in full — it is 14 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;schema_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node"&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;packages/monday-api-mcp/dist/index.js"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;MONDAY_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test-token"&lt;/span&gt;

&lt;span class="na"&gt;scenarios&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;list_tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Capture&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;schemas,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;descriptions,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;annotations"&lt;/span&gt;
    &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;list_tools&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run &lt;code&gt;mcp-recorder record-scenarios scenarios.yml&lt;/code&gt;, it spawns the server as a subprocess, performs the MCP handshake, calls &lt;code&gt;tools/list&lt;/code&gt;, and writes everything into a cassette. The &lt;code&gt;MONDAY_TOKEN&lt;/code&gt; is set to a dummy value because &lt;code&gt;initialize&lt;/code&gt; and &lt;code&gt;tools/list&lt;/code&gt; don't validate the token — they simply enumerate the in-memory tool registry. No network calls, no secrets, no real API access required.&lt;/p&gt;

&lt;p&gt;The resulting golden cassette is roughly 3,200 lines of JSON, capturing every tool's name, description, input schema, and annotations. Because this cassette is committed to the repo, it functions as living documentation — and more importantly, when someone opens a pull request that changes the tool surface, the diff tells you precisely what changed. "This PR added a required &lt;code&gt;workspace_id&lt;/code&gt; parameter to &lt;code&gt;get_items&lt;/code&gt;" or "this PR renamed the &lt;code&gt;create_board&lt;/code&gt; tool" are not things you need to discover by reading source code — they appear as JSON diffs in the PR, ready for review.&lt;/p&gt;

&lt;p&gt;I took a similar approach with &lt;a href="https://github.com/tavily-ai/tavily-mcp/pull/113" rel="noopener noreferrer"&gt;Tavily's MCP server&lt;/a&gt; (a search API, 5 tools, 50+ parameters), but pushed it further by including actual &lt;code&gt;tools/call&lt;/code&gt; invocations in the scenarios. Because the server is spawned without a &lt;code&gt;TAVILY_API_KEY&lt;/code&gt;, tool calls hit the API key validation guard and return a deterministic &lt;code&gt;McpError&lt;/code&gt; — which means the cassette captures not only the full schema surface but also the error contract. If someone changes the error message format or the error code, the cassette catches it.&lt;/p&gt;

&lt;p&gt;Both integrations are fully additive — no existing files were modified.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;The quickest way to see this in action is against a public demo server. Save the following as &lt;code&gt;scenarios.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;schema_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://mcp.devhelm.io&lt;/span&gt;

&lt;span class="na"&gt;scenarios&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_walkthrough&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Record&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;schemas&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sample&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;call"&lt;/span&gt;
    &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;list_tools&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;list_resources&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;call_tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;add&lt;/span&gt;
          &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;2&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;b&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;3&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;call_tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;greet&lt;/span&gt;
          &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;style&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pirate"&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-recorder

&lt;span class="c"&gt;# Record cassettes from the scenarios file&lt;/span&gt;
mcp-recorder record-scenarios scenarios.yml

&lt;span class="c"&gt;# Inspect what was captured&lt;/span&gt;
mcp-recorder inspect cassettes/demo_walkthrough.json

&lt;span class="c"&gt;# Verify — should pass against the same server&lt;/span&gt;
mcp-recorder verify &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cassette&lt;/span&gt; cassettes/demo_walkthrough.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; https://mcp.devhelm.io
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For your own server, the pattern is the same. Write a scenarios file pointing at your stdio command or HTTP URL, record, commit the cassettes, and add a verify step to CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/mcp-regression.yml (the relevant step)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install mcp-recorder&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;mcp-recorder verify \&lt;/span&gt;
      &lt;span class="s"&gt;--cassette cassettes/tools_and_schemas.json \&lt;/span&gt;
      &lt;span class="s"&gt;--target-stdio "node dist/index.js"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are working in a Python project, the pytest plugin activates automatically on install. Each test gets an isolated replay server on a random port:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.mark.mcp_cassette&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cassettes/golden.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_no_regression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcp_verify_result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;mcp_verify_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failed&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp_verify_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a change is intentional — you genuinely meant to rename that tool — update the cassette with &lt;code&gt;--update&lt;/code&gt; and the new snapshot becomes the baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cassette as contract
&lt;/h2&gt;

&lt;p&gt;Once you commit a cassette, something quietly useful emerges: your git history becomes an audit trail of your MCP server's public interface. Every tool rename, every schema change, every description edit appears as a diff, attributed to an author, tied to a pull request. You did not set out to build a changelog of your tool surface, but you have one.&lt;/p&gt;

&lt;p&gt;This has a natural extension for the other side of the relationship. If you &lt;em&gt;consume&lt;/em&gt; an MCP server you do not control — a third-party integration, a vendor API — the same approach works in reverse. Record what the server exposes today, run verify on a schedule, and detect when the upstream shifts before your agents do.&lt;/p&gt;

&lt;p&gt;The MCP spec does not yet have a mechanism for pinning tool versions — &lt;a href="https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1575" rel="noopener noreferrer"&gt;the proposals&lt;/a&gt; are still under discussion. Until something ships, a committed cassette is the closest thing to a pinned contract.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/devhelmhq/mcp-recorder" rel="noopener noreferrer"&gt;mcp-recorder&lt;/a&gt; is MIT-licensed and on PyPI. I would be glad to hear what works and what does not — issues and pull requests are welcome.&lt;/p&gt;

&lt;p&gt;We're working on more tooling for MCP and agent reliability — sign up for updates at &lt;a href="https://devhelm.io" rel="noopener noreferrer"&gt;devhelm.io&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>devops</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
