Forem: Tomer Lihovetsky

We stopped leaving GitHub to debug test failures. Here's how.

Tomer Lihovetsky — Fri, 01 May 2026 12:32:23 +0000

CI is red. You open the PR. Now what?

You click the failing workflow. You read the logs. You open the trace viewer in a separate tab. You cross-reference the error with the code. You search Slack to see if this happened before. You go back to GitHub to leave a comment.

Every time. For every failure.

The problem isn't that debugging is hard. The problem is that you keep leaving GitHub to do it — even though GitHub is where you make the merge decision.

We built QAI Agent to fix that. This post is about two things that changed how we work: asking QAI questions directly from a PR comment, and getting the exact code fix inline on the PR.

The problem with CI feedback

Your tests fail. The PR comment tells you:

8 failures
3 unique clusters
Risk: High

Useful. But it doesn't answer the question every developer actually asks: "Is this worse than last week?"

That question requires history. It requires context across runs. It requires something that remembers.

Ask QAI from the PR

Once QAI is connected to your repo, you can mention @qai-agent in any PR comment and ask anything about your test failures.

Real example — I typed this on a PR with 18 failing tests across 4 frameworks:

@qai-agent What are the key fixes that would resolve roughly 80% of the test failures across all suites?

The response came back instantly:

QAI Agent 🪄

Key Fixes for ~80% of Failures

Fix 1: Login flow missing load state wait (covers ~50% of failures)

All checkout and user-account beforeEach timeouts across all 4 suites trace to the same root cause: login completes but the page isn't fully loaded before navigation/assertions proceed.

Playwright (login() helper):

async function login(page: Page) {
  await page.goto('/login');
  await page.fill('[name="username"]', 'demo');
  await page.fill('[name="password"]', 'demo');
  await page.click('[type="submit"]');
  await page.waitForLoadState('networkidle'); // ADD THIS
}

Selenium Java (@BeforeEach setup):

new WebDriverWait(driver, Duration.ofSeconds(60))
  .until(ExpectedConditions.visibilityOfElementLocated(By.name("username")));
new WebDriverWait(driver, Duration.ofSeconds(60))
  .until(webDriver -> ((JavascriptExecutor) webDriver)
    .executeScript("return document.readyState").equals("complete"));

Selenium Python (setup_checkout fixture):

wait = WebDriverWait(driver, 60)  # increase from 30
wait.until(EC.visibility_of_element_located((By.NAME, "username")))

Fix 2: Inverted/stale assertions in search & cart tests (covers ~15% of failures)

const fullCount = await cards.count();
await page.fill('[data-testid="search"]', 'bamboo');
const filteredCount = await cards.count();
expect(filteredCount).toBeGreaterThan(0);
expect(fullCount).toBeGreaterThan(filteredCount); // was backwards

Fix 3: Empty cart text locator mismatch (covers ~2% of failures)

// Change from:
await $('//*[contains(text(),"Your cart is empty")]');
// To:
await $('//*[contains(text(),"Your cart is empty.")]');

Fix	Tests resolved	Suites
Login load state + timeout	~13 tests	All 4
Inverted/stale assertions	~4 tests	Playwright, WebdriverIO
Cart empty text locator	1 test	WebdriverIO

Three fixes. One question. ~18 tests resolved.

It didn't just list what broke — it told us which fix to do first, how many tests each one resolves, and gave the exact code per framework. If you're running Playwright, Selenium Java, Selenium Python, and WebdriverIO on the same PR, the fix isn't the same in each one. QAI knows which frameworks are affected and generates the right code for each.

Ask about history — the question no test runner can answer

The most powerful use case isn't "why did this fail." It's "is this getting worse?"

@qai-agent is there an increase in failures comparing to previous PRs?

Yes, there's a dramatic increase in failures compared to previous PRs.
Previous PRs: 7 occurrences of these patterns. This PR: 20 occurrences each — nearly 3x increase.

Root causes:
→ UI_CHANGED failures (4 tests) — 85% confidence
→ TIMING_FLAKE failures (4 tests) — 70% confidence
→ 0% flaky score — consistent, reproducible failures

Verdict: This PR introduced systematic failures. Block merge until UI locator issue and timing problems are resolved.

That's not a test runner. That's a senior QA engineer reviewing your PR.

A single failure is noise. A 3x increase in failures across PRs is a signal. QAI can answer that in seconds because it has the history. Your team doesn't.

Some other questions you can ask:

@qai-agent why is this test failing?
@qai-agent is this flaky or a real regression?
@qai-agent how long has this been broken?
@qai-agent what's the fastest fix for the cart failures?
@qai-agent is this the same failure we saw last week?

Each answer includes historical context, severity classification, confidence score, and a fix suggestion.

The code fix — already on the PR, without asking

The second feature shows up automatically. When QAI analyzes a PR, the comment includes an inline code fix for high-confidence failures. You don't need to ask. It's already there.

For a TEST BUG cluster at 70% confidence:

The test "search narrows results to matching products" has inverted logic on line 23. [View fix →]

test('search narrows results', async ({ page }) => {
  const cards = page.locator('a[href^="/products/"]');
  const initialCount = await cards.count();

  await page.getByPlaceholder(/search/i).fill('bamboo');
  await expect(page.getByText(/bamboo/i).first()).toBeHidden({ timeout: 10_000 });

  const filteredCount = await page.locator('a[href^="/products/"]').count();
  await page.getByPlaceholder(/search/i).fill('');
  expect(await page.locator('a[href^="/products/"]').count())
    .toBeGreaterThanOrEqual(initialCount);
});

Ready to copy and apply. No dashboard. No trace viewer. No tab switching.

The PR comment also breaks results down by suite:

Suite	✅ Pass	❌ Fail	Total	Pass rate
Selenium Python	10	4	14	71%
Selenium Java	9	4	13	69%
WebdriverIO	4	1	5	80%
Total	23	9	32	72%

And at the bottom of every comment:

💬 Ask QAI anything about this PR:
Comment @qai-agent <your question> — examples:
• @qai-agent why is this failing?
• @qai-agent is this flaky or a real regression?
• @qai-agent what's the fastest fix?

Why this matters

Most test tools are read-only. You look at them. They don't talk back.

Ask QAI flips this. Instead of navigating to a dashboard, opening a report, filtering by date, comparing runs manually — you just ask. In the same place you're already working. The context stays in the PR. The team sees the answer.

The PR is where you decide whether to merge. That's where the analysis should live.

Setup — two steps

Step 1 — Add the Action to your workflow:

- name: QAI Agent
  uses: useqai/qai-agent@v1
  if: always()
  with:
    junit-path: 'test-results/results.xml'
    qai-url: https://ingest.useqai.dev
    qai-api-key: ${{ secrets.QAI_API_KEY }}

Step 2 — Install the QAI GitHub App on your repo (required for @qai-agent replies)

Get your free API key at useqai.dev — 30 seconds, no credit card.

Try it before connecting anything

Zero setup: Paste your JUnit XML at useqai.dev/try — no account, no GitHub, no secrets. See exactly what QAI posts on a PR in 30 seconds.

Fork and see: Fork useqai/demo-shop — QAI is already wired up across 4 frameworks. Open a PR, comment @qai-agent, and see it respond.

🔧 GitHub Action: useqai/qai-agent
📦 Source: github.com/useqai/qai-agent
📊 Dashboard + Ask QAI: useqai.dev

If you try it and hit any edge cases — unusual JUnit variants, frameworks not listed — open an issue or drop a comment here.

From CI Failure to Fix in Under a Minute — QAI Agent Now Closes the Full Loop

Tomer Lihovetsky — Tue, 07 Apr 2026 13:54:33 +0000

QAI Agent now does more than cluster failures and score PR risk. It alerts your team in Slack, explains why tests failed with AI-powered RCA, and generates the fix — all without leaving the PR.

A few weeks ago I wrote about QAI Agent — a GitHub Action that clusters test failures and scores PR risk.

The feedback was clear: clustering is useful. But developers wanted more. Not just what broke — why it broke, and how to fix it.

So we closed the loop.

Here's what happens now when your CI fails.

Step 1 — Slack finds you

You don't check CI. CI comes to you.
The moment a high-risk PR is detected, your team gets a Slack alert:

🔴 High Risk PR #28 — Do not merge
Risk: 0.60 · 8 test failures · 8 clusters
Recommendation: investigate failures first
No polling. No tab switching. It just appears in your team channel.

Step 2 — The PR comment tells you everything

You click "View in QAI Platform." The PR already has a full breakdown:

8 failures listed with exact errors
8 unique failure clusters identified
RCA Analysis table — cause, confidence score, suggestion
💡 AI fix suggestions available

The RCA table is new. For each Playwright trace, QAI detects the failure category and confidence.

Rule-based detection, no cloud required. It runs locally on the GitHub Actions runner.

Step 3 — AI generates the fix

Click "View in QAI" → open the failing test → hit "Suggest fix."

The AI explains exactly what went wrong:

The price locator matches multiple elements (paragraph and span both showing $54.95), causing a strict mode violation that prevents the visibility check.

Then generates the fix:

await expect(page.locator('span').filter({ hasText: /\$\d+/ })).toBeVisible();

One click. Ready to apply.

The full loop

CI fails
→ Slack alert fires to your team channel
→ PR comment posts: clusters + RCA + confidence scores
→ AI fix suggestion on demand
→ Merge verdict: go or no-go

Before QAI: open CI → read logs → guess → fix → repeat.
After QAI: open Slack → click link → apply fix → merge.

Setup — still one step

Nothing changed on the setup side. Add one step after your tests run:

- name: QAI Agent
  uses: useqai/qai-agent@v1
  if: always()
  with:
    junit-path: 'test-results/results.xml'
    trace-path: 'test-results/**/*.zip'   # optional, enables RCA

For Slack alerts and AI fix suggestions, connect the cloud platform with two more lines:

    qai-url: https://ingest.useqai.dev
    qai-api-key: ${{ secrets.QAI_API_KEY }}

Get your free API key at useqai.dev — 30 seconds, no credit card.

GitHub Action — fully open source:

PR comment with risk score ✅
Failure clustering ✅
Playwright trace RCA (rule-based, runs locally) ✅
Block merges on high risk ✅

Cloud platform (useqai.dev):

AI fix suggestions (LLM-powered) ✅
Slack alerts for high-risk PRs ✅
Historical trends + flakiness tracking ✅
Cross-repo visibility ✅

Try it

🔧 GitHub Action: useqai/qai-agent on the Marketplace
📦 Source: github.com/useqai/qai-agent
📊 Dashboard: useqai.dev
💬 Live PR comment demo: PR #2

If you try it and hit any issues — open an issue or drop a comment here. Especially interested in non-Playwright frameworks and edge cases in JUnit parsing.

Stop Drowning in CI Noise: QAI Agent Clusters Your Test Failures and Tells You What Actually Broke

Tomer Lihovetsky — Mon, 16 Mar 2026 13:16:17 +0000

You open a PR. CI is red. There are 47 failed tests.

Now what?

You scroll through a wall of test names. Some look related. Some look flaky. Some are probably the same root cause repeated across 20 test cases. You don't know which to fix first, or whether it's even safe to merge.

This is CI noise — and it's eating engineering time every single day.

What QAI Agent does

QAI Agent is a GitHub Action that runs after your tests and posts an intelligent summary directly on the pull request.

It does three things:

1. Clusters failures by root cause

Instead of showing you 47 test names, it groups tests that failed for the same underlying reason. If 30 tests all hit the same null pointer, that's one cluster — one thing to fix.

It works by normalizing error messages: stripping timestamps, line numbers, UUIDs, memory addresses, file paths, and variable values, then hashing the result. Tests with the same normalized signature are the same failure.

2. Scores PR risk

Based on the fail rate and number of unique failure patterns, it outputs a risk level: low, medium, or high. You can use this to automatically block merges on high-risk PRs.

3. Analyzes Playwright traces (optional)

If you're using Playwright and save traces on failure, QAI Agent will unzip and analyze them locally — no cloud required. It detects five failure categories:

Cause	How it's detected
UI Changed	Locator not found, strict mode violation
Backend Error	HTTP 5xx response during test
Test Bug	Assertion errors in console logs
Timing / Flaky	Timeout on step
Environment Failure	Network failures, ECONNREFUSED

Setup in 60 seconds

Add one step to your existing workflow, after your tests run:

- name: QAI Agent
  uses: useqai/qai-agent@v1
  if: always()
  with:
    junit-path: 'test-results/results.xml'

Your workflow needs pull-requests: write permission:

jobs:
  test:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npx playwright test --reporter=junit
      - name: QAI Agent
        uses: useqai/qai-agent@v1
        if: always()
        with:
          junit-path: 'test-results/results.xml'
          trace-path: 'test-results/**/*.zip'   # optional, for RCA

That's it. No account. No API key. No configuration.

The PR comment it generates

Every PR gets a comment like this:

It shows:

Risk level and merge recommendation
Failed tests with their error messages
Failure clusters (grouped by root cause)
RCA analysis from Playwright traces (if provided)

The comment is upserted — it updates in place when you push new commits, so it doesn't spam your PR timeline.

Block merges on high risk

QAI Agent exposes outputs you can use in subsequent steps:

- name: QAI Agent
  id: qai
  uses: useqai/qai-agent@v1
  with:
    junit-path: 'test-results/results.xml'

- name: Block merge on high risk
  if: steps.qai.outputs.risk-level == 'high'
  run: |
    echo "High risk — investigate failures before merging"
    exit 1

Available outputs: risk-level, risk-score, failed-tests, total-tests, cluster-count.

Works with any JUnit-compatible framework

Framework	How to get JUnit output
Playwright	`--reporter=junit`
Jest	`--reporters=jest-junit`
Vitest	`--reporter=junit`
pytest	`--junitxml=results.xml`
Maven/JUnit	built-in
Go (gotestsum)	`--junitfile results.xml`

What it doesn't do without the cloud

No historical context — without connecting a cloud backend, QAI Agent only sees the current run. It can't tell you "this failure has been flaky for 3 weeks."
No LLM explanations — the RCA is rule-based, not AI-generated. It detects categories of failure, not the specific cause in your code.
Playwright traces only — the RCA analysis only works with Playwright trace zip files, not other test frameworks.

A cloud platform useqai.dev adds historical trends chart across all runs, flakiness tracking, cross-repo visibility, and LLM-powered root cause analysis.

Try it

GitHub Action: useqai/qai-agent on the Marketplace
Source: github.com/useqai/qai-agent
Live PR comment demo: https://github.com/useqai/qai-agent/pull/2
Dashboard: https://useqai.dev

If you try it, open an issue or leave a comment here — especially if you run into a framework or JUnit variant that doesn't parse correctly. Happy to fix it.