Don Johnson

Posted on Apr 13 • Edited on Apr 24

Vibe Coding with Intent: A Tester’s Perspective

#testing #ai #programming #productivity

There’s a lot of noise right now around vibe coding.

People are generating entire apps off vibes, using LLMs to go from idea to code in minutes. And I’ll admit—it’s impressive. But having been in testing for over 20 years, I couldn’t help but notice something was missing.

The energy is real. The speed is addictive. But the stability? The reliability? Often nowhere in sight.

That’s where I come in.

Who I Am

I’ve been a tester most of my life. My job has always been to make sure products deliver on their promise—not just in ideal demos, but in real-world use.

I’ve seen every testing angle: automation, exploratory, performance, security, chaos. And when vibe coding entered the chat, I was curious.

I started building with it.

I started testing it.

And I started breaking it—easily.

The Problem with Raw Vibes

Vibe coding is missing a backbone. It’s missing intention. The AI can generate, but can it validate? Can it correct itself? Can it know when it’s broken?

Not unless we give it the tools to do so.

How I Made It Work

That’s where my tester mindset kicked in. I created a project called vibe-todo to prove that vibe coding can be viable—if it’s guided by structure and testing.

Here’s what made the difference:

Hypothesis tests as self-correction checkpoints
A .windsurfrules file that tells the AI how to behave
Benchmarks with SLA enforcement (10ms max per op)
Logs that are structured for LLMs to observe and react to
A Grammatical Evolution engine that generates edge cases dynamically

Yes, I even wired in a GE-based evolutionary testing suite using DEAP to evolve inputs over time. This thing will literally mutate payloads to find failures your unit tests missed.

And just like that—LLM code went from vibes to validated.

Real Talk: What Happens Without Tests?

Let me give you an example.

The LLM once generated a beautiful CRUD API. Looked perfect.

Until I ran it.

POST requests with empty strings didn’t trigger validation. Worse—no unit test existed to catch it. I only saw the issue when Hypothesis tests started failing fast.

This wasn’t an edge case. This was a gap.

Without testing, LLMs hallucinate correctness.

Structured Logs = Observability for LLMs

I output structured JSON logs with every operation:

{
  "operation": "add_task",
  "duration_ms": 8,
  "sla_pass": true,
  "timestamp": 1713040000.01
}

This lets Windsurf agents watch the output, check SLA compliance, and even suggest fixes based on violation patterns.

It's like putting Grafana in your LLM’s head.

What Every Vibe Coder Should Add

You don’t need 20 years of testing experience to strengthen your vibes. Start here:

Minimal Testing Manifesto:

.windsurfrules (your AI coding contract)
Hypothesis tests for behavior validation
Regression tests for known fail states
Benchmark tests with SLA enforcement
Grammatical Evolution (GE) for fuzzing + edge discovery
Structured logs for agent feedback

If you’re doing even half of this—you’re ahead of the game.

My Ask to the Community

Don’t abandon vibe coding.

Just add intent.

Make your vibes testable. Add rules. Use benchmarks. Think like a tester.

And then? Let the LLM earn that green check.

Closing

I've tested software for decades. And I can tell you this:

Code that feels good isn’t always good.

But code that’s structured, tested, benchmarked—and still feels good?

That’s the future.

Vibe coding doesn’t need to be risky. It can be resilient.

It can be with intent.

Built with vibes. Verified by tests. Ready for prod.

Explore What’s Next in DevOps, IaC, and Security

Join us for demos, and learn trends, best practices, and lessons learned in Platform Engineering & DevOps, Cloud and IaC, and Security.

Save Your Spot

Top comments (4)

Melroy van den Berg • Apr 18 • Edited

I think you are bringing up great ideas and improvements here. You're right, now I manually need to validate the code. And even manually need to validate the generated (unit) tests. I don't trust the output so most of the work is now in QA.

So your manifesto is very helpful, insightful and inspiring. I will definitely try to apply your ideas.

Brian B. • Apr 23

A little tip from a veteran vibe coder, before it was even a thing. Have your llm create a list of issues in issue form. It's often difficult to get the model to get really granular with it's planning at times, but a way to kinda trick it into looking closer and deeper is making it create thought out issues by category.

It also aids the planning stage in figuring out the ordering as well. Here is a little peak at my obsidian notebook showing a little section of what I'm talking about: