<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sahil Singh</title>
    <description>The latest articles on Forem by Sahil Singh (@sahilsingh8300).</description>
    <link>https://forem.com/sahilsingh8300</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875744%2Fedd5a010-abc5-4247-800f-8072c15ddcf2.jpg</url>
      <title>Forem: Sahil Singh</title>
      <link>https://forem.com/sahilsingh8300</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sahilsingh8300"/>
    <language>en</language>
    <item>
      <title>Why urlparse() isn't a guard</title>
      <dc:creator>Sahil Singh</dc:creator>
      <pubDate>Thu, 16 Apr 2026 10:00:10 +0000</pubDate>
      <link>https://forem.com/sahilsingh8300/why-urlparse-isnt-a-guard-j0d</link>
      <guid>https://forem.com/sahilsingh8300/why-urlparse-isnt-a-guard-j0d</guid>
      <description>&lt;h1&gt;
  
  
  Why &lt;code&gt;urlparse()&lt;/code&gt; isn't a guard
&lt;/h1&gt;

&lt;p&gt;A lot of code looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The author parsed the URL, so the URL is validated. Right?&lt;/p&gt;

&lt;p&gt;No. &lt;code&gt;urlparse()&lt;/code&gt; is a parser. It tells you what the pieces of a URL are. It does not tell you whether you should fetch it. If &lt;code&gt;url&lt;/code&gt; is &lt;code&gt;http://169.254.169.254/latest/meta-data/&lt;/code&gt;, &lt;code&gt;urlparse()&lt;/code&gt; returns a perfectly valid &lt;code&gt;ParseResult&lt;/code&gt; and &lt;code&gt;httpx.get()&lt;/code&gt; cheerfully fetches AWS metadata credentials from inside your VPC.&lt;/p&gt;

&lt;p&gt;This is the SSRF class of bug. It's boring. It's also the thing that keeps showing up in MCP servers — tools that accept a URL, fetch it server-side, return the body to the model. The model decides what URL to fetch based on untrusted input (a prompt, a doc, a tool response). So the URL is attacker-controlled by construction.&lt;/p&gt;

&lt;p&gt;When we wrote the SSRF check for &lt;a href="https://github.com/veloxlabsio/mcp-scan" rel="noopener noreferrer"&gt;mcp-scan&lt;/a&gt; (MCPA-060), the hard part wasn't finding &lt;code&gt;httpx.get(url)&lt;/code&gt;. The hard part was deciding what &lt;em&gt;counts&lt;/em&gt; as a guard. I want to walk through that decision, because the answer is narrower than most people expect and it changes how you write the fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the check actually flags
&lt;/h2&gt;

&lt;p&gt;The check triggers on HTTP fetch calls (&lt;code&gt;httpx.get&lt;/code&gt;, &lt;code&gt;requests.post&lt;/code&gt;, &lt;code&gt;urllib.request.urlopen&lt;/code&gt;, etc.) where:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The URL argument is a variable, not a string literal.&lt;/li&gt;
&lt;li&gt;The enclosing function has no recognized host validation tied to that variable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A string literal like &lt;code&gt;httpx.get("https://api.github.com/user")&lt;/code&gt; is fine — the developer hardcoded the host. A variable URL with no guard is not fine. The interesting question is the second condition: what is a "recognized guard"?&lt;/p&gt;

&lt;h2&gt;
  
  
  Accepted: hostname membership against a trusted collection
&lt;/h2&gt;

&lt;p&gt;The primary pattern the check accepts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ALLOWED_HOSTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host not allowed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things have to be true for this to count:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The URL variable flows into &lt;code&gt;urlparse()&lt;/code&gt; or &lt;code&gt;urlsplit()&lt;/code&gt; and the result is bound to a name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;result.hostname&lt;/code&gt; or &lt;code&gt;result.netloc&lt;/code&gt; appears in a &lt;code&gt;Compare&lt;/code&gt; node with &lt;code&gt;in&lt;/code&gt; or &lt;code&gt;not in&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The other side of the comparison is a &lt;em&gt;trusted collection&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last bullet is where most of the logic lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What counts as a trusted collection
&lt;/h2&gt;

&lt;p&gt;Three things are accepted as the container side of the membership test:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A literal.&lt;/strong&gt; &lt;code&gt;parsed.hostname in {"api.example.com", "api.stripe.com"}&lt;/code&gt;. The allowlist is right there in the source. Nothing ambiguous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A local name whose every assignment is a literal collection.&lt;/strong&gt; If a function does &lt;code&gt;ALLOWED = {"host1"}; if debug: ALLOWED = {"host1", "host2"}&lt;/code&gt;, both branches assign literals, so the name is trusted. If &lt;em&gt;any&lt;/em&gt; branch assigns from a non-literal (&lt;code&gt;ALLOWED = load_from_request(request)&lt;/code&gt;), the name is rejected — fail closed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A bare name that is not assigned locally and is not a parameter.&lt;/strong&gt; This is the module-scope case: &lt;code&gt;ALLOWED_HOSTS = {...}&lt;/code&gt; at the top of the file, referenced from inside the function. The check trusts this because module-scope names are almost always developer-controlled constants. It's trust-based. &lt;code&gt;ALLOWED_HOSTS = load_policy_from_env()&lt;/code&gt; at module scope would false-clean. Fixing that honestly would require whole-file analysis, which is out of scope for a check that runs in seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What doesn't count
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting, because the rejections are the part that most linters and security tools get wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Function parameters are rejected.&lt;/strong&gt; If someone writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allowed_hosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allowed_hosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The check fires. Why? Because &lt;code&gt;allowed_hosts&lt;/code&gt; is attacker-controlled by construction — the caller passes it in. In an MCP server, the caller is usually the model, and the model is reading attacker input. A "guard" that reads its allowlist from the same context that chose the URL is not a guard. The check explicitly collects every parameter (positional, keyword-only, &lt;code&gt;*args&lt;/code&gt;, &lt;code&gt;**kwargs&lt;/code&gt;) and refuses to trust any of them as a container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Equality is rejected.&lt;/strong&gt; &lt;code&gt;parsed.hostname == "api.example.com"&lt;/code&gt; is not accepted, only &lt;code&gt;in&lt;/code&gt; / &lt;code&gt;not in&lt;/code&gt;. Equality against a single literal is technically safe, but it collapses into a pattern that's hard to distinguish from garbage like &lt;code&gt;parsed.scheme == "https"&lt;/code&gt; (which isn't a host guard at all). Narrowing the check to membership against a collection makes the accept rule cleanly describable. If you have a one-host allowlist, write &lt;code&gt;in {"api.example.com"}&lt;/code&gt;. It reads better anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attribute chains are rejected.&lt;/strong&gt; &lt;code&gt;parsed.hostname in request.headers["X-Allowed"]&lt;/code&gt; gets flagged. The container lives in request state, which is attacker-controllable or at least not statically verifiable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DNS resolution alone is rejected.&lt;/strong&gt; Calling &lt;code&gt;socket.gethostbyname(host)&lt;/code&gt; without inspecting the result proves nothing. An attacker can DNS-rebind or point at an internal IP. The check doesn't treat "we looked up the name" as validation — only "we compared the result to a trusted set" counts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two secondary patterns
&lt;/h2&gt;

&lt;p&gt;Two other patterns the check accepts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ipaddress&lt;/code&gt; family checks on a URL-derived attribute:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ipaddress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;is_private&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no private&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specifically, the check looks for a call to a method named &lt;code&gt;is_private&lt;/code&gt;, &lt;code&gt;is_loopback&lt;/code&gt;, or &lt;code&gt;is_reserved&lt;/code&gt; where the argument is &lt;code&gt;parsed.hostname&lt;/code&gt; or &lt;code&gt;parsed.netloc&lt;/code&gt;. This is narrower than it could be — &lt;code&gt;ipaddress.ip_address(parsed.hostname).is_private&lt;/code&gt; requires tracking the intermediate object, which is multi-hop dataflow. We don't do that. If you write it as &lt;code&gt;checker.is_private(parsed.hostname)&lt;/code&gt; with the hostname passed directly, we catch it. If you chain it through an intermediate object, we miss the guard and false-positive. That's a documented limitation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Helper-name guards:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;validate_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calls to functions named &lt;code&gt;validate_url&lt;/code&gt;, &lt;code&gt;check_url&lt;/code&gt;, &lt;code&gt;allowed_host&lt;/code&gt;, or &lt;code&gt;is_allowed&lt;/code&gt; with the URL variable as an argument are trusted. This is the most generous of the three patterns — the check has no idea what &lt;code&gt;validate_url&lt;/code&gt; actually does. It could be &lt;code&gt;return True&lt;/code&gt;. But false-positives on URL handling code with custom validators were painful enough in testing that we accept the heuristic and document it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four honest limitations
&lt;/h2&gt;

&lt;p&gt;If you read the check's docstring in &lt;a href="https://github.com/veloxlabsio/mcp-scan/blob/main/src/mcp_audit/checks/source_code.py" rel="noopener noreferrer"&gt;source_code.py&lt;/a&gt;, you'll see four limitations called out explicitly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Single-hop dataflow.&lt;/strong&gt; We trace &lt;code&gt;url → urlparse(url) → parsed.hostname&lt;/code&gt;. We don't trace through intermediate variables beyond that. &lt;code&gt;host = parsed.hostname; if host in ALLOWED&lt;/code&gt; would miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Helper-name trust.&lt;/strong&gt; &lt;code&gt;validate_url(url)&lt;/code&gt; is accepted without looking inside the helper. A badly-named no-op would false-clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Module-scope trust.&lt;/strong&gt; A module-level name is assumed to be a developer-controlled constant. Dynamic globals break this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No DNS resolution as guard.&lt;/strong&gt; We don't accept name resolution as a stand-in for policy enforcement. (This is actually correct — but it means tools that claim to guard via DNS are flagged.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are in the check's description string. They ship with every finding. That matters because "here's a false positive" is a different conversation than "here's an undocumented gap in the tool."&lt;/p&gt;

&lt;h2&gt;
  
  
  What this changes about how you write the fix
&lt;/h2&gt;

&lt;p&gt;If you were going to patch your MCP server's SSRF exposure, the version that passes mcp-scan looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ALLOWED_HOSTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.stripe.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ALLOWED_HOSTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host not allowed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Module-scope literal set. Membership test. &lt;code&gt;hostname&lt;/code&gt; attribute of a &lt;code&gt;urlparse&lt;/code&gt; result. Every piece maps to a rule the check understands.&lt;/p&gt;

&lt;p&gt;The version that &lt;em&gt;looks&lt;/em&gt; like a fix but doesn't pass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host not allowed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape. Parameter instead of module constant. Check fires — because the tool is right. &lt;code&gt;allowed&lt;/code&gt; is whatever the caller passed. In an agent context, the caller is the model, and the model reads attacker input.&lt;/p&gt;

&lt;h2&gt;
  
  
  The meta-point
&lt;/h2&gt;

&lt;p&gt;Security checks that report "URL fetch without validation" don't give you a remediation. They give you a vibe. A developer who reads the finding and adds &lt;code&gt;urlparse()&lt;/code&gt; has done nothing and the tool has no way to tell them.&lt;/p&gt;

&lt;p&gt;The useful version of the check has to commit to a position on what counts. That commitment is the hard part. You'll be wrong sometimes — a valid guard using an AST shape you didn't anticipate, or a module-scope name that turns out to be dynamic. You'll false-positive real code and false-clean bad code. The discipline is documenting the shape you accept, documenting the shape you reject, and letting a developer read the check and understand why their code was flagged.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;urlparse()&lt;/code&gt; isn't a guard. Neither is &lt;code&gt;validate()&lt;/code&gt;. Neither is &lt;code&gt;if host:&lt;/code&gt;. The guard is a membership test against a collection whose contents you control.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/veloxlabsio/mcp-scan" rel="noopener noreferrer"&gt;mcp-scan&lt;/a&gt; is an open-source AST-level security scanner for MCP servers. The SSRF check discussed here is &lt;a href="https://github.com/veloxlabsio/mcp-scan/blob/main/src/mcp_audit/checks/source_code.py" rel="noopener noreferrer"&gt;MCPA-060&lt;/a&gt;. If you run MCP tools in production and want the check to run on your source, &lt;code&gt;pip install mcp-scan&lt;/code&gt; and point it at your repo.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>mcp</category>
      <category>ast</category>
    </item>
    <item>
      <title>We built an open-source security scanner for MCP servers</title>
      <dc:creator>Sahil Singh</dc:creator>
      <pubDate>Mon, 13 Apr 2026 03:39:42 +0000</pubDate>
      <link>https://forem.com/sahilsingh8300/we-built-an-open-source-security-scanner-for-mcp-servers-g3g</link>
      <guid>https://forem.com/sahilsingh8300/we-built-an-open-source-security-scanner-for-mcp-servers-g3g</guid>
      <description>&lt;p&gt;MCP servers are the new attack surface. Every agent that mounts a GitHub MCP server, a filesystem MCP server, or a custom tool server is trusting that server's tool descriptions, input schemas, and handler code. Most of that trust is misplaced.&lt;/p&gt;

&lt;p&gt;We built &lt;strong&gt;mcp-scan&lt;/strong&gt; — an open-source CLI that connects to an MCP server via stdio, introspects its tool manifest, and runs deterministic security checks against both the protocol metadata and the Python source code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install velox-mcp-scan&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/veloxlabsio/mcp-scan" rel="noopener noreferrer"&gt;github.com/veloxlabsio/mcp-scan&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Demo page:&lt;/strong&gt; &lt;a href="https://veloxlabs.dev/mcp-scan/" rel="noopener noreferrer"&gt;veloxlabs.dev/mcp-scan&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What it catches
&lt;/h2&gt;

&lt;p&gt;6 checks ship today. Two work against any MCP server (no source access needed). Four require pointing at the server's Python source with &lt;code&gt;--source&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol-level:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-001&lt;/strong&gt; (Critical) — Prompt-injection markers in tool descriptions. Imperative verbs, &lt;code&gt;&amp;lt;system&amp;gt;&lt;/code&gt; tags, exfiltration phrases. This is the Trail of Bits "line jumping" attack — the payload fires when the agent connects, before any tool call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-002&lt;/strong&gt; (High) — ANSI escapes, C0 control chars, zero-width characters hiding payloads in descriptions. The terminal renders them invisible; the LLM reads them as instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Source-code AST:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-010&lt;/strong&gt; (Critical) — Path traversal in file handlers. Flags &lt;code&gt;open()&lt;/code&gt; / &lt;code&gt;read_text()&lt;/code&gt; on user-derived paths without &lt;code&gt;is_relative_to()&lt;/code&gt; containment. &lt;code&gt;resolve()&lt;/code&gt; alone is not sufficient — that's the EscapeRoute bypass (CVE-2025-53109/53110).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-012&lt;/strong&gt; (Critical) — Shell injection. &lt;code&gt;subprocess&lt;/code&gt; with &lt;code&gt;shell=True&lt;/code&gt; and dynamic command strings. Catches the pattern from CVE-2025-68144 (Anthropic Git MCP).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-060&lt;/strong&gt; (High) — SSRF sinks. HTTP client calls (&lt;code&gt;httpx&lt;/code&gt;, &lt;code&gt;requests&lt;/code&gt;, &lt;code&gt;urllib&lt;/code&gt;) with variable URLs. The guard detection does lightweight dataflow tracking — &lt;code&gt;urlparse(url)&lt;/code&gt; alone doesn't count as validation. It traces the URL variable through parse results to hostname membership checks against trusted collections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-070&lt;/strong&gt; (High) — Hardcoded secrets. Known prefixes (&lt;code&gt;sk-&lt;/code&gt;, &lt;code&gt;ghp_&lt;/code&gt;, &lt;code&gt;AKIA&lt;/code&gt;, &lt;code&gt;xoxb-&lt;/code&gt;) and high-entropy strings in secret-named variables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;19 more checks are planned for v0.1 — see &lt;a href="https://github.com/veloxlabsio/mcp-scan/blob/main/docs/checks.md" rel="noopener noreferrer"&gt;docs/checks.md&lt;/a&gt; for the full roadmap.&lt;/p&gt;
&lt;h2&gt;
  
  
  Design decisions that matter
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fail-closed.&lt;/strong&gt; If introspection fails — server hangs, &lt;code&gt;tools/list&lt;/code&gt; errors, timeout — the scanner produces a CRITICAL finding, not an empty clean report. A security tool that silently passes on errors is worse than no tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No LLM required.&lt;/strong&gt; Every check is deterministic AST analysis and pattern matching. No API keys, no cloud calls, no probabilistic scoring. Runs fully offline in ~2 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capability-aware introspection.&lt;/strong&gt; If a server only advertises tools (not resources or prompts), a failed &lt;code&gt;resources/list&lt;/code&gt; call is informational, not critical. The scanner respects what the server actually claims to support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dataflow-tracked SSRF guards.&lt;/strong&gt; This was the hardest check to get right. Early versions counted &lt;code&gt;urlparse()&lt;/code&gt; anywhere in the function as "guarded." Six review rounds later, the detection traces &lt;code&gt;url → urlparse(url) → parsed → parsed.hostname in ALLOWED_HOSTS&lt;/code&gt; and validates that the membership target is a trusted collection (literal set, all-literal local variable, or module-scope constant). Equality comparisons (&lt;code&gt;==&lt;/code&gt;) are rejected — only &lt;code&gt;in&lt;/code&gt; / &lt;code&gt;not in&lt;/code&gt; count. Local aliases of function parameters or attributes are rejected. Any non-literal assignment in any branch poisons a local name.&lt;/p&gt;

&lt;p&gt;Four limitations are honestly documented: single-hop tracking, trust-based helper names (&lt;code&gt;validate_url(url)&lt;/code&gt;), module-scope over-trust, and no DNS resolution as a guard pattern.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it in 30 seconds
&lt;/h2&gt;

&lt;p&gt;The repo ships with &lt;code&gt;vulnerable-mcp&lt;/code&gt; — a deliberately broken MCP server with 5 planted vulnerabilities. The scanner catches all 5 (7 findings total, zero false positives).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;velox-mcp-scan

&lt;span class="c"&gt;# Protocol-level only (2 findings)&lt;/span&gt;
mcp-scan scan &lt;span class="nt"&gt;--stdio&lt;/span&gt; &lt;span class="s2"&gt;"python3 -m vulnerable_mcp.server"&lt;/span&gt;

&lt;span class="c"&gt;# Protocol + source (7 findings, all 5 vulns caught)&lt;/span&gt;
mcp-scan scan &lt;span class="nt"&gt;--stdio&lt;/span&gt; &lt;span class="s2"&gt;"python3 -m vulnerable_mcp.server"&lt;/span&gt; &lt;span class="nt"&gt;--source&lt;/span&gt; ./vulnerable_mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output formats: terminal (default), JSON (&lt;code&gt;-f json&lt;/code&gt;), Markdown (&lt;code&gt;-f markdown&lt;/code&gt;). Non-zero exit on findings — drop it into CI as a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-020&lt;/strong&gt; — Curated MCP dependency CVE matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCPA-061&lt;/strong&gt; — Markdown image / auto-link exfiltration vector detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP/SSE transport&lt;/strong&gt; — &lt;code&gt;--url&lt;/code&gt; for remote servers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OAuth 2.1 DCR flow auditing&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why we built this
&lt;/h2&gt;

&lt;p&gt;Security tools that ship without a reference vulnerable target are hard to evaluate. We wanted something you could install, point at a deliberately broken server, and see exactly what it catches — on your own laptop, in 30 seconds, without connecting to anything real.&lt;/p&gt;

&lt;p&gt;MCP is early. The default configs are copy-pasted from examples. There's no equivalent of SELinux for tool-use permissions yet. The window to harden these before they become the 2027 supply-chain story is open right now.&lt;/p&gt;




&lt;p&gt;Built by &lt;a href="https://veloxlabs.dev" rel="noopener noreferrer"&gt;Velox Labs&lt;/a&gt; — AI Security &amp;amp; Platform Engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/velox-mcp-scan/" rel="noopener noreferrer"&gt;velox-mcp-scan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/veloxlabsio/mcp-scan" rel="noopener noreferrer"&gt;veloxlabsio/mcp-scan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Demo: &lt;a href="https://veloxlabs.dev/mcp-scan/" rel="noopener noreferrer"&gt;veloxlabs.dev/mcp-scan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check catalog: &lt;a href="https://github.com/veloxlabsio/mcp-scan/blob/main/docs/checks.md" rel="noopener noreferrer"&gt;docs/checks.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
