Forem: vinmay

Your `pip install` Just Stole Your SSH Keys: The LiteLLM Supply Chain Attack Explained

vinmay — Tue, 24 Mar 2026 20:19:13 +0000

A single pip install litellm==1.82.8 was enough to drain everything off your machine. No suspicious imports. No weird prompts. Just a package install, and your AWS credentials, SSH keys, and API keys were already heading to an attacker's server.

Here's what happened, why it's scary, and what you can actually do about it.

What Happened

On March 24, 2026, LiteLLM version 1.82.8 landed on PyPI with a malicious file bundled inside: litellm_init.pth.

That .pth extension is why this attack is so nasty.

Python automatically runs .pth files in your site-packages directory every time the Python interpreter starts. No import needed, no user interaction. The attacker hid a double base64-encoded payload inside this file. The moment Python ran, the payload ran too.

What did it grab? Pretty much everything:

All your environment variables (OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY, all of it)
SSH private keys (~/.ssh/id_rsa, id_ed25519, and more)
AWS, GCP, Azure, and Kubernetes credentials
Git credentials and .gitconfig
Shell history (~/.bash_history, ~/.zsh_history)
Docker configs, npm tokens, database passwords
Crypto wallet files

Then it encrypted everything with AES-256, wrapped the key with a hardcoded RSA-4096 public key, and shipped it all off to https://models.litellm.cloud/. Note that domain: litellm.cloud, not the real litellm.ai. Classic.

Why the Scale Is Scary

LiteLLM gets 97 million downloads per month. That alone is a huge problem.

But supply chain attacks don't stop at the direct install. They travel through dependency trees. If you installed dspy, langchain, or any of the other popular AI packages that depend on litellm>=1.64.0, you were also exposed without ever typing pip install litellm yourself.

The attack was only live for about an hour. It got discovered almost by accident: a developer's machine ran out of RAM and crashed because the payload was executing inside Cursor through an MCP plugin that pulled in litellm as a transitive dependency. A bug in the attacker's own code gave them away.

If that bug wasn't there, this could have run quietly for days or weeks across thousands of CI/CD pipelines, dev machines, and prod servers. Nobody would have noticed.

The Real Problem: You Can't See What's Inside Your Dependencies

When you run pip install something, you're not just installing one thing. You're pulling in a whole tree of packages, and any one of them could be compromised.

This isn't new, but it's getting worse as the AI package ecosystem keeps exploding. New packages, new versions, new dependencies dropping every single day. The attack surface is growing way faster than anyone can audit it.

We're taught to think of dependencies as a good thing. Reusable building blocks, standing on the shoulders of giants, all that. The LiteLLM incident is a reminder that every dependency is also a trust decision, and most of us are making those decisions without really thinking about it.

What You Should Actually Do

If you installed litellm 1.82.8:

Check for litellm_init.pth in your site-packages/ directory
Rotate everything: every API key, SSH key, and cloud credential that was on that machine
Check any CI/CD environment where litellm might have been installed too

Going forward:

Pin your dependencies. Exact version locks (==) in production instead of >=. It won't stop a poisoned release from getting in if you're on that version, but it stops silent upgrades pulling in something bad later.
Use lockfiles. pip-compile, poetry.lock, uv.lock, whatever fits your setup. Know exactly what you're running.
Audit transitive dependencies. pip-audit and safety scan your full dependency tree for known issues. Worth running in CI.
Don't pip install as root. Limits how much damage a compromise can actually do.
Keep an eye out for .pth files. They're a legit Python feature, but they're also a perfect delivery mechanism for malware. If you see one in site-packages from a package you don't recognise, that's worth investigating.

Can We Do Better Than Grep?

Most of the advice above is reactive. It helps you recover or reduce damage after something gets in. What's actually hard is knowing before you run anything that a package can even reach your credentials.

This is the gap I've been trying to close with something I'm building called ReachScan. The idea is pretty simple: instead of just matching against a list of known bad packages, it maps what a codebase or its dependencies can actually reach: filesystem paths, environment variables, system resources. If a package has no business touching ~/.ssh/, you should know that before it runs, not after.

It won't catch everything. But knowing the capability surface of what you're about to install is a lot better than just hoping nothing in the tree is malicious.

The Uncomfortable Truth

Karpathy put it well after this incident: the way we think about dependencies needs to change. The whole "building pyramids from bricks" model assumes the bricks are trustworthy. In 2026, that's a harder assumption to stand behind.

That doesn't mean stop using dependencies. That's not realistic. It just means:

Be deliberate about what you pull in
Actually understand what each dependency can do on your machine
Have a rotation plan for credentials that treats compromise as a when, not an if

The LiteLLM attack got caught by luck. The next one might not be.

I built "npm audit" for AI agents

vinmay — Sat, 21 Mar 2026 02:23:50 +0000

I was adding MCP tools to a project when I realized something uncomfortable: I had no idea what the code I was installing could actually do.

The README said "connects Claude to Blender." What it didn't say was that one of the registered tools passes a raw string parameter to Python's exec() with no builtin restriction. The LLM doesn't get "Blender API access." It gets full Python execution on the host machine.

I wanted a way to know this before running the code. So I built one.

What reachscan does

reachscan is a static analysis CLI for Python and TypeScript/JavaScript AI agent codebases. Point it at a repo, a PyPI package, or an MCP endpoint, and it tells you:

What the code can do (shell exec, file access, network calls, credential access, dynamic code execution)
Which of those capabilities the LLM can actually trigger (reachability analysis)
The exact call path from the LLM entry point to the dangerous code

pip install reachscan

# Scan a GitHub repo
reachscan https://github.com/user/repo

# Scan a PyPI package before installing
reachscan pypi:some-agent-package

# Scan local code
reachscan ./my-agent

That's it. No config, no API keys, no cloud service. It runs offline and produces a report in about 2 seconds.

The problem

When you give an LLM tools, you're granting it real-world capabilities like file access, shell commands, network calls, credential reads. Most frameworks make it easy to add tools and hard to audit what you've exposed.

Here's real code from a popular MCP server:

@mcp.tool()
def execute_blender_code(ctx: Context, code: str) -> str:
    """Execute arbitrary Python code in Blender."""
    blender = get_blender_connection()
    result = blender.send_command("execute_code", {"code": code})

That code: str parameter? It ends up here:

exec(code, {"bpy": bpy})  # No __builtins__ restriction

namespace = {"bpy": bpy} looks like a sandbox. It isn't. Without explicitly setting __builtins__, Python injects the full builtins module. The LLM can import os, run subprocess, read your files — anything.

Here's what reachscan shows for this server:

  DYNAMIC  exec()                  server.py:431  reachable
           path: execute_blender_code → send_command → execute_code

  EXECUTE  subprocess.run()        addon.py:89    reachable

  SEND     requests.post()         server.py:198  reachable
           path: generate_3d_model → _call_api

  SECRETS  os.environ[...]         server.py:12   module_level

The reachable tag is the key part. It means the LLM can trigger this code through a registered tool and not just that the code exists somewhere in the repo. module_level means it runs on import. unreachable means the code exists but no LLM call path leads to it.

How it works (briefly)

Detectors scan the AST for 7 capability categories: EXECUTE, READ, WRITE, SEND, SECRETS, DYNAMIC, AUTONOMY
Entry point detection finds the functions exposed to the LLM — @tool, @mcp.tool(), @function_tool, BaseTool subclasses, etc. across LangChain, OpenAI Agents SDK, MCP, Pydantic AI, CrewAI, Semantic Kernel, and AutoGen
Call graph + BFS traces up to 8 hops from each entry point to determine which capabilities are actually reachable
Every finding gets one of 5 states: reachable, unreachable, module_level, unknown, no_entry_points

The false positive rate is 0.47% across 1,912 labeled findings on 10 real-world repos. I care about this number a lot because a noisy scanner is a useless scanner.

Why I built it

The short version: I was evaluating third-party MCP servers and realized there was no npm audit equivalent for AI agent code. I could run pip audit to check for known vulnerabilities in dependencies, but nothing told me "this package gives the LLM shell access on your machine."

The existing tools I found either:

Require API calls per scan (expensive, not offline)
Produce flat capability lists without reachability context (noisy)
Don't handle the MCP/agent-specific entry point patterns

So I built the tool I wanted.

What it found across 50 real MCP servers

I ran reachscan against 50 of the most popular MCP server repos:

1 in 3 has shell execution capability
1 in 3 has outbound network I/O
1 in 4 accesses credentials from environment variables
10 of 50 had 4+ capabilities active simultaneously

The highest-risk combination: credential access + network egress. That appeared in 8 of 50 repos. If the LLM can read your AWS keys AND make HTTP calls, that's an exfiltration path.

Not all of these are bugs. An AWS MCP server should talk to AWS. The question is whether the LLM can misuse those capabilities — and whether you know about them before you deploy.

Try it

pip install reachscan

# Scan any GitHub repo
reachscan https://github.com/ahujasid/blender-mcp

# Scan a PyPI package before installing
reachscan pypi:openai-agents

# JSON output for CI
reachscan . --json --severity high

Apache 2.0, pure Python, runs offline. No API keys, no cloud service.

If something looks wrong — false positive, missed pattern, bad output — open an issue.

GitHub: vinmay/reachscan
PyPI: reachscan
Full scan results (50 repos): Medium writeup

I scanned 50 MCP servers to see what they can actually do — here's what I found

vinmay — Thu, 12 Mar 2026 03:35:44 +0000

One of the 50 MCP servers I scanned gives the LLM a full Python shell
on your machine. The tool is called execute_blender_code. The exec()
call has no builtin restriction. I verified it — imports, file reads,
subprocess execution all work.

That's what I built reachscan to find.

The problem

MCP servers aren't plugins in a sandboxed browser extension model. They
run as normal OS processes with your user permissions. If a server calls
subprocess.run(), the LLM can trigger shell commands. If it calls
exec() without restricting builtins, the LLM gets Python execution on
your machine.

Most people don't know which of the servers they're running fall into
which category.

What reachscan does

It's a static analysis CLI for Python and TypeScript/JavaScript agent
codebases. It maps seven capability categories:

Capability	What it means
`EXECUTE`	Shell execution via subprocess, os.system
`READ/WRITE`	Local filesystem access
`SEND`	Outbound HTTP, sockets
`SECRETS`	Env var credential access
`DYNAMIC`	exec(), eval(), importlib
`AUTONOMY`	Background threads, autonomous loops
Entry points	LLM-callable tool registrations

The key distinction from a linter: reachscan tracks reachability.
A subprocess.run() buried in dead code is a different risk than one
called directly from a registered MCP tool.

Install it:

pipx install reachscan
reachscan https://github.com/any-org/any-mcp-server

False positive rate: 0.47% across a labeled corpus of ~3,900 findings.

The headline finding: blender-mcp

blender-mcp has a registered
MCP tool called execute_blender_code:

@mcp.tool()
def execute_blender_code(ctx: Context, code: str) -> str:
    """Execute arbitrary Python code in Blender."""
    blender = get_blender_connection()
    result = blender.send_command("execute_code", {"code": code})

That string travels over a local TCP socket to the Blender addon:

def execute_code(self, code):
    namespace = {"bpy": bpy}
    exec(code, namespace)  # ← line 431

Why `namespace = {"bpy": bpy}` doesn't protect you

This looks like a sandbox. It isn't.

When you call exec(code, namespace) without setting
namespace["__builtins__"], Python automatically injects the full
builtins module. I verified this:

namespace = {"bpy": object()}

exec('import os; print(os.getcwd())', namespace)
# → /home/user/...  ✓

exec('print(open("/etc/hostname").read())', namespace)
# → my-machine  ✓

exec('import subprocess; r = subprocess.run(["id"], \
capture_output=True, text=True); print(r.stdout)', namespace)
# → uid=1000(user) gid=1000(user)  ✓

The LLM doesn't just get Blender API access. It gets full Python
execution on the host.

The same primitive, done right

Finding exec() in a scan result doesn't automatically mean critical.
Two projects in this dataset handle it correctly:

awslabs/mcp — strips builtins before exec():

namespace['__builtins__'] = _SAFE_BUILTINS
# Removes: __import__, exec, eval, compile, open, getattr...

StarRocks MCP — validates the AST before eval():

validate_plotly_expr(plotly_expr)  # must be exactly px.<method>(...)
fig = eval(plotly_expr, {"px": px}, local_vars)

Same pattern. Very different trust boundary.

Results across 50 repos

1,139 findings across 18 repos with non-zero activity
1 in 3 repos has shell execution capability
1 in 4 accesses credentials from environment variables
10 of 50 had 4+ capabilities active simultaneously

The TypeScript gap

22 repos showed 0 capability findings — many are TypeScript-only.
reachscan detects TypeScript entry points (452 found) but doesn't yet
analyze TypeScript function bodies. "Clean" for a TS-heavy repo means
"no Python capability findings" — not verified safe.

TypeScript capability analysis is next.

Main takeaway

Adoption is moving faster than visibility. Treat MCP servers like
privileged code, not plugins. Audit tool boundaries before you deploy.

Try it on any MCP server or agent repo:

pipx install reachscan
reachscan https://github.com/any-org/any-mcp-server

GitHub: github.com/vinmay/reachscan

Full writeup with all 50 results and methodology:
Medium article

Forem: vinmay

Your `pip install` Just Stole Your SSH Keys: The LiteLLM Supply Chain Attack Explained

What Happened

Why the Scale Is Scary

The Real Problem: You Can't See What's Inside Your Dependencies

What You Should Actually Do

Can We Do Better Than Grep?

The Uncomfortable Truth

I built "npm audit" for AI agents

What reachscan does

The problem

How it works (briefly)

Why I built it

What it found across 50 real MCP servers

Try it

I scanned 50 MCP servers to see what they can actually do — here's what I found

The problem

What reachscan does

The headline finding: blender-mcp

Why namespace = {"bpy": bpy} doesn't protect you

The same primitive, done right

Results across 50 repos

The TypeScript gap

Main takeaway

Why `namespace = {"bpy": bpy}` doesn't protect you