Forem: Elise Vance

I scanned every major vibe coding tool for security. None scored above 90.

Elise Vance — Wed, 15 Apr 2026 21:51:29 +0000

I'm a non-technical founder. I can't write code. I built two production apps entirely with AI.

Last week I scanned my own app for security. It scored 20/100. Found 8 vulnerabilities including a critical auth bypass where missing config silently allows all requests.

So I built Vibe Check — an AI-powered security scanner for vibe-coded apps. Then I pointed it at the tools themselves.

The Scorecard

Repo	Score	Critical Finding
open-lovable (Firecrawl)	0/100	Unauthenticated command execution across 3 API routes
Devika	40/100	LLM responses executed via subprocess.run() without validation
Cloudflare VibeSDK	78/100	API tokens logged in plain text, missing OAuth CSRF protection
Bolt.new	90/100	Command injection via user-controlled shell action content
Cline	96/100	Minor CI script command injection
Cursor	100/100	Clean

The tools millions of people use to build apps have their own security issues. If their code has vulnerabilities, what about yours?

Why Static Scanners Miss These

Three days ago, VibeDoctor launched a scanner and scanned some of these same repos. They use SonarQube, Gitleaks, Trivy, and Lighthouse — six tools running in parallel. They scored Devika 66/100.

I scored it 40/100 and found a CRITICAL command injection they missed entirely.

The difference is approach. Static scanners pattern-match for known vulnerabilities. They look for eval(, hardcoded API keys, outdated dependencies. They're good at this.

What they can't catch is intent.

Consider this real code from Devika:

# LLM generates a list of commands
commands = response.split("\n")
for cmd in commands:
    subprocess.run(cmd.split(), capture_output=True)

No eval(). No obviously dangerous function call. SonarQube sees subprocess.run() with a list argument — that's the safe way to call subprocess. Clean.

But the commands come from an LLM response. Whatever the AI decides to generate gets executed on the server. That's not a syntax bug. That's an intent bug. The code does exactly what it was written to do — it just shouldn't have been written to do that.

The Bug That Started Everything

Here's the function from my own production app that Vibe Check caught:

def _auth_ok(req: Request) -> bool:
    secret = (WEBHOOK_SECRET or "").strip()
    if not secret:
        return True  # no secret set -> allow all
    ...

Every static tool I ran against this said it was clean. No hardcoded secrets. No SQL injection. No XSS. The code is syntactically perfect.

The bug: when WEBHOOK_SECRET isn't set in the environment, the function returns True and every webhook request is authorized. In development, that variable is often unset. In production, one env var typo means your backend is wide open.

This is a semantic bug. You only catch it by reading the code and asking "what happens when the inputs are missing?"

How Vibe Check Works

Vibe Check sends your code to Claude (Anthropic's AI) with a security-focused prompt covering six categories: secrets, auth, injection, data exposure, dependencies, and config.

The prompt has specific instructions for the bugs AI coding tools create:

Default-allow patterns. Flag if not secret: return True as critical.
LLM output as execution input. Flag any path where model output reaches subprocess, eval, or file write without validation.
Missing auth on action endpoints. Flag routes that perform destructive actions without authentication checks.

Output is structured via Anthropic's tool-use API. Every finding has a category, severity, file, line number, description, and suggested fix — all in plain English.

False Positives Are the Real Product

LLMs are noisy security reviewers. During a day of self-scanning, Claude flagged:

"render.yaml contains sync: false for secrets — could be misconfigured"

(sync: false is the correct Render.com setting for secrets.)

"compare_digest is correctly implemented"

(Flagged as a finding even though it's literally the fix.)

The solution isn't more prompting — Claude ignores guardrails about a third of the time. The real fix is a hard post-parse filter that drops findings matching known false-positive patterns:

Infrastructure config files at low/medium severity
Template files for secrets-category findings
Test files entirely
Self-contradictory descriptions ("X is correct, but...")
Speculative hedging ("could be a", "potentially allowing")
Dev-environment-only warnings

Critical severity never gets filtered. We want false positives at critical to surface for human review.

After this filter, Vibe Check's own repo scans as 100/100 with zero findings. Without the filter, it bounced between 79 and 94 with different findings each run.

The Numbers Behind the Problem

This isn't theoretical:

65% of vibe-coded production apps have security issues (Escape.tech, 1,400 apps scanned)
45% of AI-generated code contains OWASP Top 10 vulnerabilities (Veracode, 100+ LLMs tested)
35 CVEs traced to AI-generated code in March 2026 alone (Georgia Tech Vibe Security Radar)
1.5 million API keys leaked from one vibe-coded app within 3 days of launch (Moltbook)
63% of vibe coding users are not developers

That last number is the one that matters most. Most vibe coders can't read the code they're shipping. They can't audit it. They can't tell if the login page can be bypassed.

I'm one of them. I built two production apps without writing a line of code. I had no idea my code was vulnerable until I scanned it.

The Loop

Here's what Vibe Check enables:

Scan — paste a GitHub URL, get a score in 60 seconds
Download — get the findings as a markdown file
Fix — paste the findings into your AI coding tool (Cursor, Claude Code, Lovable, whatever you used to build it)
Re-scan — verify the fixes worked and your score went up

My own app went from 20/100 to working on fixes now. The scanner caught what I couldn't see. The AI coding tool fixed what the scanner found. The re-scan verified the fixes worked.

That's the product. Scan. Fix. Verify.

Try It

Vibe Check is free. No signup needed for your first scan. Your code is never stored — we scan it, report findings, and forget it.

https://chat-api-19ij.onrender.com

Code is open source at evance1227/chat.

Scan your repo. You might be surprised what's hiding in code that works perfectly.

— Elise Vance (@shecantcode)

Static vs Semantic: how a security scanner reads AI-generated code

Elise Vance — Wed, 15 Apr 2026 01:52:21 +0000

Three days ago VibeDoctor launched a scanner for AI-generated apps. They scan by running six tools in parallel: SonarQube, Gitleaks, Trivy, Lighthouse, plus custom checks. They scanned open-lovable, devika, and bolt.new and found hundreds of issues. It's good work.

But there's a whole class of bug their approach can't see. I built Vibe Check to catch that class. Here's what's different.

The gap

Static scanners pattern-match. They look for eval(, os.system(, hardcoded API keys, outdated dependencies. They're good at this, and they catch a lot.

What they can't catch is intent.

Consider this real function from a repo I scanned this week:

def _auth_ok(req: Request) -> bool:
    """Accept multiple common secret formats so GHL/curl both work."""
    secret = (WEBHOOK_SECRET or "").strip()
    if not secret:
        return True  # no secret set -> allow all
    ...

Grep for known vuln patterns: nothing. SonarQube: clean. Gitleaks: no secrets here (that's the point). Trivy: no CVEs. Every static tool I threw at it said OK.

The bug is that when WEBHOOK_SECRET is unset in the environment, the function returns True and the webhook is fully open. In development WEBHOOK_SECRET is often unset. In production, a simple env-var typo becomes an unauthenticated remote action vector.

This is a semantic bug. You only catch it by reading the code and asking "what happens when the inputs are missing?" That's a human pen-tester mindset, not a regex.

How Vibe Check reads code

Vibe Check sends files to Claude via the Anthropic API with a custom prompt focused on six categories: secrets, auth, injection, data exposure, dependencies, config. The prompt has specific guardrails for LLM failure modes:

Default-allow patterns. Explicit instructions to flag if not secret: return True as critical.
Dynamic SQL in column names. Parameterized queries are safe until the column name or ORDER BY clause comes from an f-string. The prompt flags this explicitly.
Privacy invariant. Claude is told to never echo actual secret values in findings. The scanner itself never persists source code, raw responses, or the secrets it finds.

Output is structured via Anthropic's tool-use API. Every finding has category, severity, file, line, title, description, suggested_fix, and a verbatim code_snippet field that Vibe Check uses to auto-correct line numbers post-hoc (Claude hallucinates line numbers; the snippet search fixes that).

False positives are the hard part

LLMs are noisy. Over a day of self-scanning my own repo, I watched Claude emit gems like:

"render.yaml contains sync: false for secrets — could be misconfigured"
(sync: false is the correct Render.com setting)

"compare_digest is correctly implemented"
(flagged as a finding even though it's literally the fix)

"SQL query logging during development could expose sensitive data"
(it's development; that's the whole point of echo=True)

The solution isn't more prompting. Claude ignores prompt guardrails about a third of the time. The real fix is a hard post-parse filter that drops findings matching known false-positive patterns:

Infrastructure config files (render.yaml, .github/workflows/*, Dockerfile, *.tf) at low/medium severity
Template files (.env.example) for secrets-category findings
Test files (tests/, conftest.py) entirely
Self-contradictory descriptions ("X is correct, but...")
Speculative hedging at low/medium ("could be a", "if this were", "potentially allowing")
Dev-environment-only warnings ("during development", "in non-production")

Critical severity never gets filtered. We want false positives at critical to surface for human review, not be silently dropped.

After this filter, my own repo scans as 100/100 with zero findings. Without it, the score bounced between 79 and 94 across seven runs with completely different findings each time. Filtering is the product.

What I found in the wild

I scanned a 24K-star AI-coding repo (responsible disclosure in flight; this post will be updated with the repo name after Thursday 2026-04-17 EOD UTC). Top findings:

Finding	File	Why static scanners miss it
Unauthenticated command execution	`app/api/run-command-v2/route.ts`	No auth gate before shelling out. No `eval()` or `child_process.exec()` with obviously-tainted input. Just a route handler that trusts any caller.
Arbitrary file write via AI-generated content	`app/api/apply-ai-code-stream/route.ts`	Writes to disk. Static tools see `fs.writeFile` and don't flag it.
Missing auth on package installation	`app/api/install-packages-v2/route.ts`	The action is "install arbitrary npm package". No auth.
API key leaked in error responses	`app/api/search/route.ts`	Regex scanners look for hardcoded keys in source. This one is echoed back to the client on failure paths.

Combined, these form a complete "submit code → write it → execute it" chain. A separate scanner ran against this same repo three days ago and flagged hundreds of issues. None of these four. They require reading the route handlers end-to-end and asking what's actually authenticated.

Background

I'm a non-technical founder. I can't write code. I built two production apps using AI coding tools and realized I had no way to know if they were safe. 65% of vibe-coded apps have security vulnerabilities. 35 CVEs were traced to AI-generated code in March 2026 alone.

So I built the tool I needed. Vibe Check uses Claude to understand what code is trying to do, and catches when it silently fails. Built by someone who can't code, for everyone who can't code.

Try it

Vibe Check is free, no signup required for basic scans. Sign in with GitHub for scan history. Your code is never stored.

https://chat-api-19ij.onrender.com

Code is open at evance1227/chat. Feedback welcome, especially on the prompt and the false-positive filter.

— Elise Vance (@shecantcode)