Forem: Let's Automate 🛡️

How to Generate Cypress, Playwright, and WebdriverIO Tests From Natural Language Using AI

Let's Automate 🛡️ — Mon, 04 May 2026 23:01:02 +0000

A step-by-step breakdown of an open-source platform that converts plain English requirements into runnable E2E tests — no manual coding required

Writing end-to-end tests is one of those things every developer knows they should do well and almost nobody actually enjoys. You spend an hour getting a Playwright spec to click the right button, another hour figuring out why the selector breaks in CI, and by then the feature has already been redesigned anyway.

So when I came across a project that lets you describe what you want to test in plain English — and then generates the actual test code — I had to dig in.

The project is called AI Natural Language Tests , built under AI Quality Lab. It is open source on GitHub, has a published academic DOI on Zenodo, and as of this week just shipped v5.0.0. You can also try it right now in your browser on Hugging Face Spaces — no installation needed.

Here is what it does, how it works, and why it deserves a spot in your QA toolkit.

The Core Idea

Instead of writing:

cy.get('#username').type('admin')
cy.get('#password').type('secret')
cy.get('[type=submit]').click()
cy.contains('Dashboard').should('be.visible')

You just say:

"Test login with valid credentials"

The platform reads that sentence, visits the URL you point it at, analyzes the live HTML to find the actual form fields and selectors, then generates a complete runnable test — in Cypress, Playwright, or WebdriverIO, whichever you prefer.

That is the pitch. But the internals are more interesting than the pitch.

What Is Actually Happening Under the Hood

This is not a thin wrapper around a ChatGPT call. It runs a structured five-step workflow built with LangGraph — and each step has a clear purpose.

Step 1 — Understand the page. When you pass a --url, the system fetches the live HTML and extracts real selectors, form fields, and interactive elements. This is what prevents it from hallucinating IDs that do not exist on your page.

Step 2 — Check memory. The system keeps a vector database (FAISS + SQLite) of patterns from every test it has previously generated. Before writing anything new, it searches for similar past tests using semantic similarity. If it has seen a login flow before, it reuses what worked.

Step 3 — Generate with an LLM. The actual test code is produced by your choice of LLM — OpenAI, Anthropic Claude, or Google Gemini. LangChain handles prompt templating and output parsing, while LangGraph turns the multi-step flow into a repeatable, auditable pipeline rather than a single prompt-and-pray call.

Step 4 — Optional human review. There is a --approve flag that pauses execution before saving the generated test and asks a human to confirm. This Human-in-the-Loop gate is especially useful when running the tool against production-critical flows where you want a set of eyes before anything gets committed.

Step 5 — Run it. Pass --run and the tool immediately executes the generated test through the framework runner. If it fails, an AI-assisted failure analyzer categorizes the error and suggests a fix — more on that below.

Getting Started Takes About Five Minutes

git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv && source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt
npm ci
npx playwright install chromium

Add your API key to a .env file:

OPENAI_API_KEY=your_key

Then generate and immediately run a test:

python qa_automation.py "Test login with valid credentials" \
  --url https://the-internet.herokuapp.com/login \
  --framework playwright \
  --run

That single command fetches the page, generates a .spec.ts file, and runs it through Playwright — without you writing a line of test code.

If you just want to see it work before installing anything, the live Hugging Face Spaces demo lets you paste in a requirement and watch the generation happen in real time.

Three Frameworks, One Workflow

The tool supports all three major E2E frameworks with the same natural language interface. You switch between them with a single flag:

3 Frameworks, One Workflow

The Cypress integration is worth noting specifically — it supports two distinct modes. The traditional mode generates standard Cypress code. The prompt-powered mode uses cy.prompt() to keep natural language embedded directly in the test, which is useful for teams exploring the newer AI-native Cypress APIs.

If your team is mid-migration from Cypress to Playwright, you can generate equivalent tests in both frameworks from the same requirement and compare them side by side.

Writing Prompts That Actually Work

The output quality depends heavily on how specific you are. A few patterns that work well:

Name the expected outcome. “Test login fails with wrong password and shows an error message” produces a far more precise test than “Test login.”

Chain multiple requirements. You can pass several prompts in one run: "Test login" "Test logout" --url — each gets its own generated file.

Always use --url. Giving the tool a real page means it reads actual HTML instead of guessing selector names. This is the single biggest factor in test quality, because the generator extracts real element IDs and attributes from the live DOM.

Some practical examples:

Usage : https://github.com/aiqualitylab/ai-natural-language-tests#usage

When Tests Fail: AI-Assisted Diagnosis

One of the more practical features is the failure analyzer. Instead of staring at a cryptic Cypress error, you pass it to the tool:

python qa_automation.py --analyze "CypressError: Element not found"

The analyzer categorizes the error into one of ten types — SELECTOR, TIMING, ASSERTION, NETWORK, STATE, NAVIGATION, INTERACTION, CONFIGURATION, ENVIRONMENT, or DYNAMIC_URL — then gives you a plain-English explanation of the root cause and a concrete suggestion for fixing it.

You can also pipe in a full log file: python qa_automation.py --analyze -f error.log

The Quality Evaluation Layer

This is the part most people skip over in the README, but it is arguably the most important piece for teams that care about reliability.

Generating test code is only valuable if the generated tests are actually correct. The project includes two evaluation scripts that measure whether the output is grounded in the real page content.

Offline evaluation (no API key needed). The ragas_nlp_evaluator.py script compares generated output against a reference dataset using ROUGE and string similarity metrics. It runs entirely offline, exits with a non-zero code if quality drops below a configurable threshold, and is designed to run as a fast CI gate.

LLM-based evaluation (requires OpenAI key). The ragas_evaluator.py script goes further. It fetches the live page HTML, uses GPT-4o-mini to answer the test requirement using that HTML, then scores the generated test on four dimensions: faithfulness to the page, relevance to the requirement, context precision, and context recall.

Both evaluators are wired into the GitHub Actions CI pipeline. The offline script runs first as a baseline check. If it passes, three parallel jobs spin up — one per framework — each generating tests, evaluating them with the LLM evaluator, and then executing them. If the score drops below threshold, the pipeline blocks before the tests even run.

You are not shipping generated tests blindly. You have a measurable, automated quality signal at every stage.

Docker and CI/CD

The project ships pre-built Docker images on GitHub Container Registry. You can skip the clone entirely:

docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest

docker run --rm \
  -e OPENAI_API_KEY=your_key \
  ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
  "Test login" --url [https://the-internet.herokuapp.com/login](https://the-internet.herokuapp.com/login)

For CI/CD, pin to a specific release tag (v5.0.0) rather than latest for reproducibility. The recommended pipeline stages cover dependency installation, NLP baseline evaluation, test generation, LLM evaluation, test execution, and optional telemetry export to Grafana Tempo and Loki.

Observability (Optional but Thoughtful)

If your team runs Grafana, the project has native OpenTelemetry integration that exports traces to Grafana Tempo and ships logs to Loki. This is entirely optional — leaving the relevant environment variables unset disables it completely. But for teams that already operate a Grafana stack, having AI test generation traces alongside your application traces is a genuinely useful debugging surface.

What It Does Not Do Yet

To be fair about the limits: the current CLI works through URL-driven generation. A --data flag for passing raw JSON specifications directly is not implemented yet. If your tests target APIs or non-rendered content, you will need to adapt. Given the active release cadence — nine releases with v5.0.0 landing this week — that gap may close soon.

Why This Matters Beyond the Tool Itself

The bottleneck in most QA pipelines is not running tests — it is writing them. Engineers skip test authoring because it is slow, tedious, and breaks constantly as UIs change. This tool makes the first draft essentially free, which lowers the activation energy enough that more tests actually get written.

The pattern memory design compounds the value over time. Every test the system generates gets stored as a vector embedding. Future generations for similar requirements pull from those patterns, so the output becomes more consistent and more project-specific as usage grows. It is not just generating tests in isolation — it is building institutional knowledge about how your application is structured.

The Ragas evaluation layer means you can measure whether that knowledge is accurate, and block on it in CI if it is not.

Try It

The project is open source at github.com/aiqualitylab/ai-natural-language-tests.

Want to experiment without installing anything? The live demo is on Hugging Face Spaces.

Hugging Face Space

Are you using AI-assisted test generation in your pipeline?

Share what has worked — and what has not.

QA Bug Triage Pipeline: From App Reviews to Searchable Bug Reports

Let's Automate 🛡️ — Tue, 28 Apr 2026 19:02:48 +0000

A simple Python project that turns messy user reviews into structured QA bug reports using an LLM and RAG.

📖 Full guide: blog.aiqualitylab.org

Why this project

Product teams get lots of feedback, but most of it is noisy and unstructured. This project helps QA teams convert that feedback into consistent bug records that are easy to search and summarize.

Photo by Guille B on Unsplash

What it does

Collects reviews from Google Play

Routes review text (bug report vs non-bug)

Generates structured JSON bug reports with an LLM

Stores bugs in ChromaDB for semantic retrieval

Adds BM25 keyword matching for hybrid search

Produces short AI summaries for triage

Lets you clear the stored bugs from the UI

API key

This app uses BYOK (Bring Your Own Key):

Paste your OpenAI API key in the UI

The key is masked

Do not commit keys to source control

Main files

app.py: Gradio app flows

collect.py: review collection

triage.py: routing and structured triage logic

rag.py: storage and hybrid retrieval

eval/eval.py: evaluation script

Evaluation sample

Answer Relevancy: 0.868

Faithfulness: 0.292

Context Precision: 0.020

Cost target

For a short demo session, the expected usage is typically under $0.50.

Tips:

Keep review count low (5 to 10)

Avoid repeated large collection runs

Use short test inputs when validating triage

Tech stack

Python

Gradio

OpenAI GPT-4o

ChromaDB

rank-bm25

RAGAS

google-play-scraper

This project is useful for QA teams that want a lightweight bug triage assistant with searchable bug intelligence and fast summaries.

Prompt Injection Attacks Are Breaking AI Products — Here’s How to Stop Them

Let's Automate 🛡️ — Sat, 25 Apr 2026 12:23:02 +0000

The Simple, Non-Technical Guide to Defensive Prompting: How to Protect Your LLM-Powered App Before Someone Exploits It

📖 Full guide: blog.aiqualitylab.org

Your AI is only as safe as the thought you put into protecting it. Prompts aren’t just instructions — they’re the rules your AI lives by. Protect them like you’d protect any critical part of your product.

Photo by Nik Shuliahin 💛💙 on Unsplash

The teams winning at AI aren’t just the ones moving fast. They’re the ones moving fast and thinking about this.

AI Is Normal Now. The Problems Aren’t.

GitHub Copilot CLI Remote: Control Your AI Coding Agent From Phone and Web

Let's Automate 🛡️ — Fri, 17 Apr 2026 17:10:11 +0000

New copilot --remote preview lets you steer Copilot CLI sessions from GitHub.com and GitHub Mobile — here's what it does and why it matters

📖 Full guide, team scenarios, and honest limitations: blog.aiqualitylab.org

💻 Source on GitHub: aiqualitylab/blog

🔗 Official GitHub changelog: Remote control CLI sessions on web and mobile

If you use AI coding tools in your terminal, you know the problem. You start a 20-minute task, step away, and come back to find the agent stalled — waiting for you to approve something ten minutes ago.

On April 13, GitHub shipped a fix: copilot --remote.

GitHub Copilot CLI Remote: Control Your AI Coding Agent From Phone and Web

What it does

Turn on remote mode and your CLI session streams to GitHub in real time. Your terminal shows a link and a QR code. Open it on any phone or browser, and you get a live, two-way view. You can send messages, approve permissions, switch modes, and stop the session — all from your phone.

How to turn it on

copilot --remote

You need to be in a GitHub repo.

Copilot Business and Enterprise users need an admin to enable the policy first.

AI-Assisted Testing vs AI Agents vs AI Agent Skills: A Practical Journey Through All Three

Let's Automate 🛡️ — Sat, 07 Mar 2026 13:08:54 +0000

Most teams are only using one layer of AI in testing. Here is what the full picture looks like — and how I built across all three.

Photo by Possessed Photography on Unsplash

Before any of this made sense, I had to answer a more basic question: what does AI QA Engineering actually mean?

What is AI QA Engineering — and Why QAEs, SDETs, and QA Automation Engineers Should Pay Attention

And before touching AI at all — the foundations still matter. Clean BDD tests. Reports that stakeholders can read.

How to Add Beautiful BDD Test Reports to Your Reqnroll Project Using Expressium LivingDoc

Before you automate smarter, you have to know what good looks like.

Layer 1 — AI-Assisted Testing

AI speeds you up. You are still driving.

This is where most teams start — and where most teams stay.

You write a prompt, get a test, review it, ship it. AI is a productivity multiplier. GitHub Copilot suggests the next line. ChatGPT drafts your test cases. Claude rewrites a flaky selector. You are in control at every step.

The catch? A bad prompt gives you a bad test — and it will look convincing. Garbage in, confident garbage out.

Crafting Effective Prompts for GenAI in Software Testing

I built ai-natural-language-tests at this layer. Give it a plain English requirement, and it generates Cypress or Playwright tests using GPT-4, LangChain, and LangGraph. Every output still needs your eyes on it — but the heavy lifting is done.

Same idea with JIRA-QA-Automation-with-AI : feed it a JIRA story with acceptance criteria, and BDD test scripts come out the other side. Human judgment still required at the end. You own every decision.

That last part is the definition of this layer.

Layer 2 — AI Agents for Testing

You give the goal. The agent executes, adapts, and decides.

At this layer, you stop steering and start delegating.

You set the objective. The agent figures out how to get there — and when something breaks mid-run, it handles that too. No human in the loop for every step.

selenium-selfhealing-mcp is a good example of what this looks like in practice. A UI change breaks a Selenium locator mid-execution. The agent inspects the DOM, finds the updated element, and keeps going — without stopping to ask you what to do. I submitted this to the Docker MCP Registry, and watching it recover from failures on its own still feels like a step-change from Layer 1.

For .NET teams, SeleniumSelfHealing.Reqnroll does the same with C#, NUnit, Reqnroll, and Semantic Kernel. And IntelliTest takes it further — write your assertions in plain English, and the agent decides whether the application behaviour actually matches the intent.

But there is a trap at this layer. Agents move fast and look thorough. It is easy to trust the output and skip the checks. Coverage looks complete — but the agent may have tested the wrong thing entirely.

The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing

And if you are using AI agents to run tests, a harder question follows: how do you know the agent’s output is correct? That is the LLM evaluation problem, and it turns out to be one of the most interesting unsolved problems in this space.

LLM Evaluation Explained: How to Know If Your AI Is Actually Working

Layer 3 — AI Agent Skills

Not a tool. Not an agent. Expertise that travels.

Layer 3 is the one most people have not thought about yet.

Here is the pattern I kept running into: every new agent project started from scratch. New codebase, new prompts, same underlying knowledge — how to read a requirement, what makes a test meaningful, when to flag a risk. The expertise was always being rebuilt. That seemed wrong.

A skill is a portable, encoded unit of expertise. It is not tied to one agent or one project. Any compatible agent can load it and apply it — without rebuilding the logic again. You build it once, and it travels.

GitHub Copilot Agent Skills: Teaching AI Your Repository Patterns

vibe-coding-checklist applies the same idea to AI code review — a shared quality framework that any team or any agent can use consistently.

The shift in thinking is subtle but significant. At Layer 1, you build prompts and tools. At Layer 2, you build goals and trust boundaries. At Layer 3, you build expertise itself — in a form that outlasts any single project or team.

The Difference That Matters

AI-Assisted Testing vs AI Agents vs AI Agent Skills

Three layers. All called AI testing. Now you know which one you are actually in.

All repos → github.com/aiqualitylab

More writing → aiqualityengineer.com

The GitHub Copilot Features That Are Quietly Draining Your Premium Requests

Let's Automate 🛡️ — Thu, 19 Feb 2026 17:19:23 +0000

10 optimisations most developers miss — including why the Copilot Coding Agent beats Agent Mode Chat every time

Most developers hit their monthly limit in the first week. Here’s what’s actually happening under the hood — and how to work smarter before it happens to you.

Photo by Resume Genius on Unsplash

Before diving in, it helps to understand what GitHub Copilot actually counts as a premium request, because most developers don’t find out until it’s too late.

Inline code completions on paid plans are unlimited and cost nothing. What drains your monthly allowance is everything else — Copilot Chat, Agent Mode, Copilot Code Review, Copilot CLI, and the Copilot Coding Agent.

Each model also carries a multiplier. Some models are included free on paid plans. Once your allowance is gone, premium features are locked for the rest of the billing cycle.

Knowing that, here’s how to make every request count.

1. Name your functions like they’re instructions

Inline autocomplete is unlimited on paid plans and costs nothing from your premium allowance. The more precisely you name a function, the more accurately Copilot completes the body without any Chat involved. This is your primary tool, not a fallback.

2. Write your intent as a comment above the cursor

A detailed comment placed directly before your cursor is treated by Copilot as an instruction. You get the same outcome as a Chat message at zero premium cost. Use this for any logic you would otherwise describe to Copilot in conversation.

3. Cycle through alternatives with Alt+] before opening Chat

When the first inline suggestion misses, most developers immediately reach for Chat. Before doing that, cycle through alternative suggestions. The second or third option is often exactly what’s needed — and one saved Chat message multiplies across a full day of work.

4. Disable Agent Mode when you’re not actively using it

Agent Mode runs in the background and silently runs even when you’re not directing it. GitHub’s official documentation explicitly flags this as a common cause of unexpected quota drain. Disable it in your repository settings when it isn’t part of your current workflow.

5. Use the Copilot Coding Agent for complex tasks instead of Agent Mode Chat

This is one of the least-known optimisations available. The Copilot Coding Agent — the one that creates and modifies pull requests asynchronously — counts as one premium request per full session regardless of how much work it does. Agent Mode Chat charges one premium request per message, multiplied by the model rate. For any task involving multiple files or significant implementation work, the Coding Agent is dramatically more efficient.

6. Start a new Chat thread when switching topics

As a conversation grows, all prior messages remain in context and contribute to token consumption. GitHub’s documentation specifically calls this out as a driver of elevated usage. When you move to a new task or a different area of your codebase, start a fresh thread rather than continuing an existing one.

7. Understand the model multiplier before choosing one

Before switching to a powerful model, weigh whether the capability gain justifies the cost. For most day-to-day work, it doesn’t.

8. Use auto model selection for a built-in discount

When you enable auto model selection in Copilot Chat in VS Code, GitHub applies a 10% multiplier discount across all premium model usage. It requires no change to your workflow and the saving compounds quietly across a full month.

9. Use #file references instead of @workspace

@workspace scans your entire codebase on every message, consuming more than most questions require. Using #file:yourfile.ts targets exactly the context Copilot needs, which produces more focused answers with less back-and-forth and fewer requests spent getting there.

10. Set a budget alert before your allowance runs out

GitHub lets you configure alerts at 75%, 90%, and 100% of any spending threshold you define. Setting a low or zero spending budget with alerts enabled means you get notified well before premium features are cut off — without risking unexpected charges. Check your current usage anytime at github.com/settings/billing or through the Copilot icon in your IDE status bar.

The Principle Underneath All of It

Every tip here points back to the same question worth asking before you open Chat: is there a way to get this through autocomplete instead?

Reference — https://docs.github.com/en/copilot

Most of the time, there is. And building that habit is what separates developers who hit the wall in week one from those who reach month end with room to spare.

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

Let's Automate 🛡️ — Sun, 01 Feb 2026 16:55:23 +0000

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

Open-source AI test automation framework with natural language test generation, self-healing, and dual framework support

Writing end-to-end tests is one of those things every team knows they should do, but nobody really enjoys doing. You stare at a login page, figure out the selectors, write the steps, handle the waits, and repeat this for every feature. I kept thinking — what if I could just say what I want to test, and let AI handle the rest?

That’s exactly what I built.

Architecture

What Is It?

ai-natural-language-tests is an open-source tool that takes a plain English description of a test scenario and generates a fully working Cypress or Playwright test file. No templates. No copy-pasting. You describe the test, point it at a URL, and it writes the code.

Here’s what a typical command looks like:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login

That single line does everything — fetches the page, reads the HTML, picks up the right selectors, and generates a complete test file you can run immediately.

Want Playwright instead of Cypress? Just add a flag:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright

How It Actually Works

Under the hood, the tool runs a 5-step workflow built with LangGraph:

Complete Workflow

Step 1 — It sets up a vector store. Think of this as a memory bank for test patterns.

Step 2 — It fetches the target URL, pulls the HTML, and extracts useful selectors like input fields, buttons, and links.

Step 3 — It searches the vector store for similar tests it has generated before. If you tested a login page last week, it remembers the patterns.

Step 4 — It sends everything to GPT-4 along with a carefully crafted prompt — the description, the selectors, and any matching patterns from history. The AI generates the actual test code.

Step 5 — Optionally, it runs the test right away using Cypress or Playwright.

The interesting part is Step 3. Every test the tool generates gets saved as a pattern. Over time, it builds a library of patterns and uses them to write better tests. The first test for a login page might be decent. The tenth one will be much better because it has learned from all the previous ones.

Why Two Frameworks?

I started with Cypress because it’s what most teams I’ve worked with use. But Playwright has been gaining serious traction — especially for teams that need multi-browser testing or prefer TypeScript.

So in v3.1, I added full Playwright support. The tool uses different prompts for each framework. The Cypress prompt focuses on chaining commands and cy.get() patterns. The Playwright prompt covers locators, async/await, network interception, multi-tab handling, and all the TypeScript-specific patterns.

You pick the framework. The AI adapts.

The Part I Didn’t Expect — Failure Analysis

While building this, I realized that generating tests is only half the problem. Tests fail. And reading Cypress or Playwright error logs can be painful, especially for someone newer to the frameworks.

So I added an AI-powered failure analyzer:

python qa_automation.py --analyze "CypressError: Timed out retrying after 4000ms"

It reads the error, explains what went wrong in plain language, and suggests a fix. You can also point it at a log file. It’s a small feature but it has saved me a surprising amount of time.

Running It in CI/CD

The tool comes with a GitHub Actions workflow out of the box. You can trigger it manually from the Actions tab — type your test description, provide a URL, pick Cypress or Playwright, and it runs the full pipeline. Generate, execute, and get results — all inside your CI.

CI/CD PIPELINE

This makes it practical for teams that want to try AI-generated tests without changing their existing setup. Just add the workflow and trigger it when you need a new test.

What I Learned Building This

A few things surprised me along the way:

Prompts matter more than the model. I spent more time refining the system prompts than on any other part of the codebase. A well-structured prompt with clear constraints produces dramatically better test code than a vague one, regardless of which GPT model you use.

Pattern learning is underrated. The vector store approach turned out to be more useful than I expected. When the tool has seen similar pages before, the generated tests are noticeably more accurate. It picks up things like common selector patterns and assertion styles from its history.

Keeping frameworks separate is important. Early on, I tried using a single generic prompt for both Cypress and Playwright. The results were mediocre for both. Dedicated prompts for each framework made a huge difference in output quality.

Try It Out

The project is open source and ready to use:

GitHub: github.com/aiqualitylab/ai-natural-language-tests

First Release — https://github.com/aiqualitylab/ai-natural-language-tests/releases/tag/v2026.02.01

Setup takes about five minutes — clone the repo, install dependencies, add your OpenAI API key, and you’re generating tests.

If you work in QA or test automation and you’ve been curious about how AI fits into your workflow, give it a try. I’d love to hear what you think.

Exploring how AI can make quality engineering more practical and less tedious. I write about this stuff regularly at AI Quality Engineer .

The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing

Let's Automate 🛡️ — Sun, 25 Jan 2026 10:47:51 +0000

A Practical Guide for Quality Engineers Who Want Results, Not Hype

When NOT to Use AI in Testing: A Simple Guide

Stop. Think. Then Decide.

The Big Question

Everyone talks about using AI in testing. But nobody talks about when to SKIP it.

This guide helps you decide: AI or no AI?

Why This Matters

AI testing sounds cool. But it comes with baggage:

It costs money — AI tools need servers, licenses, and API calls.

It needs babysitting — Models drift. Prompts need tuning. Things break in weird ways.

It’s hard to debug — When AI tests fail, figuring out WHY is painful.

Your team might forget basics — If AI does everything, manual debugging skills fade.

AI isn’t bad. But it’s not always the answer.

7 Times to Skip AI (Use Traditional Testing Instead)

1. Math and Calculations

Example: Tax calculators, loan interest, pricing formulas.

Why skip AI? The answer is either right or wrong. No guessing needed. No patterns to learn.

Do this instead: Simple data-driven tests. Input goes in. Expected output comes out. Done.

2. Audit and Compliance Systems

Example: Banking apps, healthcare records, legal documents.

Why skip AI? Auditors want proof. They want to see EXACTLY what you tested. AI is unpredictable — same prompt, different results.

Do this instead: Scripted tests with detailed logs. Every step recorded. Every result traceable.

3. Speed and Load Testing

Example: Can your app handle 10,000 users at once?

Why skip AI? You’re measuring app speed. AI adds its own delay. You’d be measuring AI, not your app.

Do this instead: Use tools built for this — JMeter, k6, Gatling. They’re fast and focused.

4. Basic CRUD Operations

Example: Create user. Read user. Update user. Delete user.

Why skip AI? It’s simple. AI is overkill. Like using a rocket to go to the grocery store.

Do this instead: Write one test template. Copy it for each operation. Fast and easy.

5. Screens That Never Change

Example: Internal admin panels. Old systems nobody touches.

Why skip AI? AI shines when things CHANGE. Self-healing locators fix moving targets. No movement? No need.

Do this instead: Regular automation. Page Object Model. Set it and forget it.

6. Security Testing

Example: Finding SQL injection, XSS attacks, login bypasses.

Why skip AI? Security needs creative thinking. Breaking things in new ways. AI follows patterns — hackers don’t.

Do this instead: Security tools (OWASP ZAP, Burp Suite) plus human testers who think like attackers.

7. Physical Device Testing

Example: Barcode scanners, payment terminals, IoT sensors.

Why skip AI? AI lives in software. It can’t press physical buttons or read blinking lights.

Do this instead: Hardware test rigs. Human testers. Real-world verification.

The Quick Decision Guide

Ask yourself these 4 questions:

DECISION TABLE FRAMEWORK

Before You Buy Any AI Tool, Answer These:

What exact problem am I solving? (Not “we want AI” — a real problem)

Can a simple script fix this? (Seriously, can it?)

How will I know if it worked? (What number goes up or down?)

Who will maintain it? (AI tools need constant care)

Can I explain it to my boss? (If you can’t explain it, don’t buy it)

The Simple Truth

AI is a tool. Not a magic wand.

Good testers know WHEN to use each tool:

USAGE CHECKLIST

One Page Summary

USE AI FOR:

Generating test ideas from requirements

Handling UI changes automatically

Analyzing why tests keep failing

Creating test data variations

Exploring edge cases

SKIP AI FOR:

Exact calculations (math, money, dates)

Compliance and audit trails

Performance/load measurements

Simple CRUD operations

Stable, unchanging systems

Security penetration testing

Physical hardware testing

Final Word

The smartest move isn’t always the newest tool.

Sometimes a simple script beats a fancy AI.

Know when to use AI. Know when to skip it. That’s real skill.

Machine Learning Pipelines Made Easy for Quality Assurance Professionals

Let's Automate 🛡️ — Sat, 10 Jan 2026 19:45:18 +0000

A very simple guide to how machine learning works

Machine learning looks hard. But it is not.

If you know QA, you already know the basics.

ML systems have three parts. We call them FTI:

F = Feature (clean the data)

T = Training (teach the model)

I = Inference (use the model)

Let me explain each one.

Part 1: Feature Pipeline

What does it do?

It cleans dirty data.

Simple example:

You have messy data. Names are written in different ways. Dates are in wrong formats. Numbers have errors.

This pipeline fixes all that. It makes data clean and ready.

Feature Pipeline Detail

In QA words:

You never test with bad data. You clean it first. This pipeline does the same thing.

The clean data goes to a Feature Store.

Part 2: Training Pipeline

What does it do?

It teaches the model.

Simple example:

You show the model 1000 pictures of cats. You tell it “this is a cat” each time. The model learns what a cat looks like.

In QA words:

You learn from requirements. Then you write test cases. The model learns from data. Then it can make predictions.

Picture:

The smart model goes to a Model Registry.

Training Pipeline Detail

Part 3: Inference Pipeline

What does it do?

It uses the model to answer questions.

Simple example:

Someone shows a new picture. The model says “this is a cat” or “this is not a cat.”

In QA words:

This is like running tests in production. The model is working and giving answers.

Inference Pipeline Detail

Two Important Storage Places

Feature Store

Keeps clean data

Saves old versions

Everyone uses same data

Model Registry

Keeps trained models

Saves old versions

You know which model is in production

The Full Picture

Full FTI Pipeline Overview

Why This is Easy for QA

You already know:

✓ How to check data quality → Test Feature Pipeline

✓ How to compare old vs new → Test Training Pipeline

✓ How to test in production → Test Inference Pipeline

Five Things to Remember

Three parts. Feature, Training, Inference. That’s it.

Clean data is key. Bad data = bad model.

Save everything. Keep old data. Keep old models. You can go back if needed.

Test each part. Don’t test everything together. Test one part at a time.

Your skills work here. QA testing skills work for ML testing too.

Last Words

ML is just software with a learning step.

You already know how to test software. Now you can test ML too.

Start simple. Ask: “Show me the three pipelines.”

Then test each one.

You can do this.

I Built an AI-Powered Test Data Generator That Analyzes Any URL and Creates Test Data JSON

Let's Automate 🛡️ — Wed, 31 Dec 2025 19:12:47 +0000

I got tired of manually inspecting HTML to find selectors. So I taught my framework to do it instead.

Architecture flow

Here’s a question that kept me up at night:

Why am I spending more time finding selectors than writing actual tests?

I watched myself burn 30 minutes on a simple login test — not writing the test itself, but hunting through DevTools for the right selectors, creating fixture files, and crafting test data that would actually work.

What if the framework could just… look at the page and figure it out?

The Problem Nobody Talks About

Here’s the dirty secret of test automation: writing the actual test is the easy part.

The hard part? Finding #username vs input[name="user"] vs .login-field. Creating realistic test data. Building fixture files that match the actual form structure.

Every new page means:

Open DevTools

Inspect elements

Copy selectors

Hope they’re stable

Create JSON fixtures

Hope nothing changes tomorrow

Most “AI-powered” testing tools focus on running tests or analyzing failures. But what about the beginning — the tedious setup that drains your time before you write a single assertion?

The Experiment: Teaching AI to See

The idea was simple but audacious: give the AI a URL and let it figure out everything else.

Not mock data. Not hardcoded selectors. Real selectors from real HTML.

Here’s what I wanted:

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login

And the framework should:

Fetch the actual page

Analyze the HTML structure

Extract real, working selectors

Generate meaningful test cases

Save everything as a Cypress fixture

Then generate tests that use that data

Sounds impossible? I thought so too.

How It Actually Works

The magic happens in about 50 lines of Python:

def generate_test_data_from_url(url: str, requirements: list) -> tuple:
    # Step 1: Fetch the real page
    resp = requests.get(url, timeout=10, headers={'User-Agent': 'Mozilla/5.0'})
    html = resp.text[:5000] # First 5KB is usually enough

    # Step 2: Ask AI to analyze it
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    prompt = f"""Analyze this HTML and generate test data.

    URL: {url}
    HTML: {html}

    Return JSON with:
    - Real selectors from the HTML
    - Valid test case with working data
    - Invalid test case for error handling
    """

    # Step 3: Parse and save as fixture
    test_data = json.loads(llm.invoke(prompt).content)

    with open("cypress/fixtures/url_test_data.json", 'w') as f:
        json.dump(test_data, f, indent=2)

    return test_data

The AI doesn’t guess. It reads the actual HTML and extracts what’s really there.

Complete Workflow

What The AI Sees vs What It Returns

When I point it at a login page, here’s the actual flow:

Input: Just a URL

--url https://the-internet.herokuapp.com/login

What the AI analyzes:

<input type="text" id="username" name="username">
<input type="password" id="password" name="password">
<button type="submit" class="radius">Login</button>

What it generates:

{
  "url": "https://the-internet.herokuapp.com/login",
  "selectors": {
    "username": "#username",
    "password": "#password",
    "submit": "button[type='submit']"
  },
  "test_cases": [
    {
      "name": "valid_test",
      "username": "tomsmith",
      "password": "SuperSecretPassword!",
      "expected": "success"
    },
    {
      "name": "invalid_test", 
      "username": "wronguser",
      "password": "badpassword",
      "expected": "error"
    }
  ]
}

Real selectors. Actual test data. Zero manual inspection.

The Generated Test Uses It All

The framework then generates a Cypress test that consumes this fixture:

describe('Login Tests', function () {
    beforeEach(function () {
        cy.fixture('url_test_data').then((data) => {
            this.testData = data;
        });
    });

it('should login with valid credentials', function () {
        cy.visit(this.testData.url);
        const valid = this.testData.test_cases.find(tc => tc.name === 'valid_test');

        cy.get(this.testData.selectors.username).type(valid.username);
        cy.get(this.testData.selectors.password).type(valid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.url().should('include', '/secure');
    });
    it('should show error with invalid credentials', function () {
        cy.visit(this.testData.url);
        const invalid = this.testData.test_cases.find(tc => tc.name === 'invalid_test');

        cy.get(this.testData.selectors.username).type(invalid.username);
        cy.get(this.testData.selectors.password).type(invalid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.get('#flash').should('contain', 'invalid');
    });
});

Notice something? The selectors come from the fixture, not hardcoded in the test.

If the page changes, update the fixture. Tests stay clean.

Two Ways to Feed Data

Sometimes you already have test data. Maybe from a previous run. Maybe from your team’s shared fixtures.

So I added a second option:

# Option 1: AI analyzes live URL
python qa_automation.py "Test login" --url https://example.com/login

# Option 2: Use existing JSON file
python qa_automation.py "Test login" --data cypress/fixtures/my_data.json

Same test generation. Different data sources. Your choice.

The Part That Surprised Me

I expected the AI to find basic selectors. What I didn’t expect was how well it understood context.

When analyzing a registration form, it didn’t just find #email — it generated test data like:

Valid: testuser@example.com

Invalid: not-an-email

For password fields:

Valid: SecurePass123!

Invalid: 123 (too short)

The AI understood what kind of data each field expected. Not because I told it — because it read the HTML attributes, labels, and validation patterns.

The Gotcha: Fixtures Need function() Syntax

One thing tripped me up for hours. Cypress fixtures with this.testData require a specific pattern:

// WRONG - arrow functions don't have 'this'
describe('Test', () => {
    beforeEach(() => {
        cy.fixture('data').then((d) => { this.testData = d; }); // undefined!
    });
});

// RIGHT - function() preserves 'this'
describe('Test', function () {
    beforeEach(function () {
        cy.fixture('data').then((data) => { this.testData = data; });
    });

    it('works', function () {
        console.log(this.testData); // actual data!
    });
});

The framework now enforces this pattern in generated tests. Lesson learned the hard way.

What This Means For Your Workflow

Before:

Open page in browser

Inspect elements manually

Copy selectors to notepad

Create fixture JSON by hand

Write test using those selectors

Fix typos in selectors

Run test

Debug why selectors don’t work

After:

Run one command with URL

Framework handles the rest

That’s not an exaggeration. The 30-minute login test? Under 2 minutes now.

Try It Yourself

The framework is open source:

git clone https://github.com/user/cypress-natural-language-tests
cd cypress-natural-language-tests
pip install -r requirements.txt

Set your API key:

export OPENAI_API_KEY=your_key_here
export OPENROUTER_API_KEY=your_openrouter_api_key_here

Generate tests from any URL:

python qa_automation.py "Test the login form" --url https://the-internet.herokuapp.com/login

Check what it created:

cat cypress/fixtures/url_test_data.json
cat cypress/e2e/generated/*.cy.js

The Bigger Picture

We’re at an interesting moment in test automation. The tooling is getting smarter, but the real breakthrough isn’t replacing testers — it’s eliminating the tedious parts.

Finding selectors is tedious. Creating fixture files is tedious. Debugging why #submit-btn worked yesterday but not today is tedious.

Let AI handle tedious. Let humans handle important.

That’s the framework I’m building.

Follow for more AI + QA experiments:

GitHub: https://github.com/aiqualitylab/cypress-natural-language-tests.git

I Built an AI-Powered Cypress Framework That Analyses Test Failures for Free

Let's Automate 🛡️ — Sun, 28 Dec 2025 14:03:59 +0000

Cypress test debugging is painful. This free AI-powered framework analyses failures instantly and tells you exactly what went wrong.

AI-Powered Cypress Framework That Analyses Test Failures for Free

Ever stared at a cryptic Cypress error message wondering what broke? 😩 We’ve all been there. That’s why I built something that changed my debugging workflow forever.

Introducing v2.1 of my Cypress Natural Language Test Framework — now featuring 🔍 AI Failure Analysis that costs you absolutely nothing.

😤 The Problem Every QA Engineer Knows

Picture this: Your CI pipeline fails and error be like:

CypressError: Timed out retrying after 4000ms: Expected to find element: '#submit-btn', but never found it.

Now you’re left guessing:

🤔 Did the selector change?

⏳ Is the page loading too slowly?

✏️ Did someone rename the button?

⚡ Is it a timing issue?

You spend the next hour digging through logs, comparing commits, and testing locally. Sound familiar?

💡 The Solution: AI That Debugs For You

With v2.1, debugging becomes a one-liner:

python qa_automation.py --analyze "CypressError: Timed out retrying: Expected to find element: #submit-btn"

Output:

🔍 Analyzing...
REASON: Element #submit-btn not found - selector likely changed during recent UI update
FIX: Use cy.get('[data-testid="submit"]') or add cy.wait() before the click action

✅ Two lines. Problem identified. Solution provided. Done.

🏗️ System Architecture

Here’s how the entire framework fits together:

⚙️ How It Works Under The Hood

The implementation is surprisingly simple. Here’s the core function:

def analyze_failure(log: str) -> str:
    response = requests.post(
        url="https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
            "Content-Type": "application/json"
        },
        json={
            "model": "deepseek/deepseek-r1-0528:free",
            "messages": [{"role": "user", "content": f"Analyze this Cypress test failure. Reply ONLY:\nREASON: (one line)\nFIX: (one line)\n\n{log}"}],
            "max_tokens": 150
        }
    )
    return response.json()["choices"][0]["message"]["content"]

That’s it. About 15 lines of code that leverage OpenRouter’s free tier with DeepSeek R1. 🆓

🛠️ Three Ways To Use It

1️⃣ Direct from command line:

python qa_automation.py --analyze "Your error message here"

2️⃣ From a log file:

python qa_automation.py --analyze -f cypress-output.log

3️⃣ Piped from another command:

cat error.log | python qa_automation.py --analyze

🔄 CI/CD Integration

The real power comes when you integrate this into your pipeline. Here’s how the updated GitHub Actions workflow looks:

- name: Run Cypress tests
  id: tests
  continue-on-error: true
  run: |
    npx cypress run --spec "cypress/e2e/generated/**/*.cy.js" 2>&1 | tee test-output.log- name: AI Failure Analysis
  if: steps.tests.outcome == 'failure'
  env:
    OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
  run: |
    echo "Analyzing failures with AI..."
    python qa_automation.py --analyze -f test-output.log

When tests fail, your CI logs now include actionable insights instead of just error dumps. 📋

🚀 Setting It Up

Step 1: Get your free API key from openrouter.ai 🔑

Step 2: Add to your .env:

OPENROUTER_API_KEY=your_key_here

Step 3: Add requests to requirements.txt (if not already there) 📦

Step 4: Start analyzing 🎉

That’s the entire setup. No complex configurations. No paid subscriptions.

🖥️ Local Development Flow

For local development, the flow is just as smooth:

📦 What’s In v2.1

Here’s everything new in this release:

Feature Description

🔍 AI Failure Analyzer Instant debugging with free LLM

🌐 OpenRouter Integration Uses DeepSeek R1 at zero cost

💻 CLI Flag Simple --analyze command

📁 File Input Analyze entire log files with -f

⚙️ CI/CD Ready Updated GitHub Actions workflow

Combined with v2.0 features:

🤖 Natural language test generation

🔄 cy.prompt() self-healing tests

📊 LangGraph workflow orchestration

📚 Vector store documentation context

🌍 Real World Example

Old approach: Manual Investigation 😓

New approach:

python qa_automation.py --analyze -f nightly-run.log

Result:

REASON: Login button selector changed from #login-btn to .auth-button
FIX: Update selector to cy.get('.auth-button') or use data-testid

REASON: API response timeout - server took 6s, test timeout was 4s
FIX: Increase timeout with cy.request({timeout: 10000}) or add retry logic

REASON: Element detached from DOM after React re-render
FIX: Add cy.wait() after state change or use {force: true} option

🔗 Try It Yourself

The framework is open source and available now:

🔗 GitHub: github.com/aiqualitylab/cypress-natural-language-tests

Clone it, set up your API keys, and start generating tests and debugging failures with AI.

💭 Final Thoughts

AI shouldn’t just generate code. It should help maintain it too. This failure analyzer is my attempt at closing that loop — from requirements to tests to debugging, all AI-assisted.

The best part? It’s completely free to use. 🆓

Give it a try and let me know how much time it saves you! 💬

If this helped you, consider ⭐ starring the repo. It helps others discover it.

AI-Powered Cypress Test Generation from Natural Language v2.0 — Now with cy.prompt() Self-Healing

Let's Automate 🛡️ — Sat, 27 Dec 2025 11:46:37 +0000

AI-Powered Cypress Test Generation from Natural Language — Now with cy.prompt() Self-Healing

Transform plain English requirements into production-ready Cypress tests using GPT-4, LangChain, and LangGraph — run locally or in CI/CD

My Open-source project: github.com/aiqualitylab/cypress-natural-language-tests, which utilizes Cypress’s official AI-powered cy.prompt() command introduced at CypressConf 2025.

AI-Powered Cypress Test Generation from Natural Language v2.0 — Now with cy.prompt() Self-Healing

Introduction

Testing shouldn’t be complicated. You know what your application should do — why spend hours writing boilerplate test code?

I built cypress-natural-language-tests to bridge the gap between your test ideas and working Cypress code. Just describe your test in plain English:

python qa_automation.py "Test user login with valid credentials" --run

Result: A complete .cy.js file generated and executed automatically!

And now, with the latest update, the framework also supports Cypress’s new cy.prompt() command for self-healing, AI-powered test execution.

What’s New: cy.prompt() Integration

Cypress recently launched cy.prompt() — their official AI command that converts natural language into test steps at runtime. My framework now supports both approaches:

Mode Description Best For Generate Mode Creates complete .cy.js test files Version control, CI/CD pipelines cy.prompt() Mode Generates tests using cy.prompt() syntax Self-healing tests, rapid prototyping

You choose what works best for your workflow!

How It Works

👆 The complete workflow — from requirements to executed tests

The framework supports two execution paths :

🖥️ Local Machine Flow v/s ⚙️ GitHub Actions CI Flow

🖥️ Local Machine Flow v/s ⚙️ GitHub Actions CI Flow

Two Powerful Modes

Mode 1: Traditional Test Generation

Generate standard Cypress test files that you own and version control:

python qa_automation.py "Test user login with valid credentials"

Output: 01_test-user-login_20241223_102030.cy.js

describe('User Login', () => {
  it('should login successfully with valid credentials', () => {
    cy.visit('https://the-internet.herokuapp.com/login');
    cy.get('#username').type('tomsmith');
    cy.get('#password').type('SuperSecretPassword!');
    cy.get('button[type="submit"]').click();
    cy.get('.flash.success').should('be.visible');
  });
});

Mode 2: cy.prompt() Generation

Generate tests using Cypress’s new AI-powered cy.prompt() command for self-healing capabilities:

python qa_automation.py "Test user login" --use-cyprompt

Output: 01_test-user-login_20241223_102030.cy.js

describe('User Login', () => {
  it('should login successfully with valid credentials', () => {
    cy.prompt([
      'Visit the login page at https://the-internet.herokuapp.com/login',
      'Type "tomsmith" in the username field',
      'Type "SuperSecretPassword!" in the password field',
      'Click the login button',
      'Verify the success message is visible'
    ]);
  });
});

Why cy.prompt()?

🔄 Self-healing : Tests adapt when UI changes

📝 Readable : Natural language steps in your test files

🛡️ Resilient : Less maintenance when selectors change

Quick Start

Installation

# Clone the repository
git clone https://github.com/aiqualitylab/cypress-natural-language-tests.git
cd cypress-natural-language-tests

# Set up Python environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure OpenAI API key
echo "OPENAI_API_KEY=your_key_here" > .env

# Initialize Cypress
npm install cypress --save-dev
npx cypress open

Generate Your First Test

# Standard Cypress test
python qa_automation.py "Test user registration flow"

# With cy.prompt() syntax
python qa_automation.py "Test user registration flow" --use -cyprompt

# Generate and run immediately
python qa_automation.py "Test homepage loads correctly" --run

Practical Examples

Example 1: Multiple Test Requirements

python qa_automation.py \
  "Test successful login with valid credentials" \
  "Test login fails with wrong password" \
  "Test login form shows validation errors for empty fields"

Creates three separate test files — one for each requirement.

Example 2: With Documentation Context (RAG)

Supercharge test generation with your own documentation:

python qa_automation.py \
  "Test checkout API according to specifications" \
  --docs ./api-documentation \
  --persist-vstore

The framework indexes your docs into ChromaDB and uses them as context for more accurate test generation.

Example 3: Generate and Execute Locally

python qa_automation.py "Test user profile update" --run

Generates the test AND runs Cypress immediately. View results in your terminal.

Example 4: CI/CD Integration

Trigger via GitHub Actions to generate tests in your pipeline:

- name: Generate Tests
  run: python qa_automation.py "${{ github.event.inputs.requirement }}"

- name: Run Cypress
  run: npx cypress run

- name: Upload Artifacts
  uses: actions/upload-artifact@v3
  with:
    name: cypress-results
    path: |
      cypress/videos
      cypress/screenshots

Why Choose This Framework?

Feature Benefit Dual Mode Support Standard Cypress OR cy.prompt() — your choice Complete Test Files Version control your generated tests Documentation-Aware RAG integration for accurate, context-rich tests Local & CI Ready Works on your machine and in GitHub Actions Model Flexibility Use GPT-4, GPT-4o-mini, or GPT-3.5-turbo Open Source Full control, no vendor lock-in

Configuration

Change AI Model

In qa_automation.py:

llm = ChatOpenAI(
    model="gpt-4o-mini", # Options: gpt-4, gpt-4o, gpt-3.5-turbo
    temperature=0
)

Set Your Application URL

Update the prompt template to target your application:

CY_PROMPT_TEMPLATE = """
...
- Use `cy.visit('https://your-app-url.com')` as the base URL.
...
"""

Get Started Now

🔗 github.com/aiqualitylab/cypress-natural-language-tests

git clone https://github.com/aiqualitylab/cypress-natural-language-tests.git

⭐ Star the repo if you find it useful!

Conclusion

Natural language test generation is here to stay. With cypress-natural-language-tests , you get:

Two modes — Traditional Cypress or cy.prompt()

Full ownership — Complete test files you control

CI/CD ready — Works locally and in GitHub Actions

Documentation-aware — RAG for accurate test generation

Open source — No vendor lock-in

Stop writing boilerplate. Start describing tests in plain English.

What’s your experience with AI-powered test generation? Drop a comment below!