Pros and Cons of Quarantined Tests

Mark Lapierre — Wed, 06 Jun 2018 20:10:00 +0000

Flaky tests, i.e., those that only fail sometimes, are the bane of any end-to-end automated test suite.

Another type of problem test is one that fails every time but which tests something that is deemed not important enough to fix right now. If you have to ignore some of the failed tests sooner or later you’re going to ignore one that you should have paid attention to. Or worse, you might decide to ignore them all because clearly no-one is fixing the bugs.

If a test is broken, fixing it should always be the first course of action, if possible. But what if some other task has a higher priority? If you’re confident that the problem is the test and not the software being tested, it might be reasonable to allow the test to keep failing, at least temporarily.

When you frequently ignore some failing tests, the whole suite is at risk of being seen as unreliable. A common way to prevent that is to quarantine the flaky/failing tests. Quarantine in this context refers to isolating the troublesome tests from the rest of the test suite. Not for fear of contagion, except in the sense of the negative impact they can have on the perception of the rest of the tests.

I think I first came across the concept in an article by Martin Fowler. It’s a great read on the topic of flaky tests and how to identify and resolve the causes of their flakiness. This post isn’t about how to fix them so check out that article if you’re after that kind of info.

More recently, an article on the Google Testing Blog mentioned the same technique for dealing with the same types of troublesome tests.

Even though quarantining tests can be a good temporary solution, if you don’t fix the tests (or the bugs) you can end up in the situation I mentioned before; a few failing tests create the impression that the entire suite is unreliable, enough so that you might consider them a death sentence.

My team and try to avoid that death sentence in a few ways:

Report quarantined test results separately from the rest of the test suite.

That way everyone can see the results of the reliable tests and know that a failure there is something that should be looked at immediately. We don’t have to try to identify the “true” failures among the flaky ones.
Tag quarantined tests with a reason they’re quarantined.

So flaky tests get tagged as such. Failing tests that aren’t going to get fixed for a while get reported and tagged with the issue number. Comments can be added if the tag isn’t sufficient. This isn’t enough to rescue a quarantined test from oblivion, but it can help avoid the potential problem of losing track of why a test was quarantined.
Schedule a regular review of quarantined tests.

If it’s not scheduled it’s not likely to happen. Failing tests can be assigned to someone to fix if priorities change, and time can be invested in fixing a flaky test if we decide it’s more important than we first thought.
Delete the test

If any test stays in quarantine for a long time it would be worthwhile rethinking the value the test provides. Maybe it turns out that unit tests, or even exploratory tests, provide enough coverage. Or the test might cover a part of the software that rarely changes, or which doesn’t get much use. In that case if there is a regression it’s not a big deal. We might @Ignore the test and leave a comment explaining why—instead of deleting it—if it seems likely someone might decide to write the test again.

How do you deal with flaky or failing tests that don’t get fixed quickly?

Inattentional Blindness and Scripted Tests

Mark Lapierre — Wed, 30 May 2018 09:56:20 +0000

My previous workplace was a large organisation in which many testers were employed to evaluate the quality of the software we developed and maintained. We had a collection of scripted test cases that testers would follow step-by-step. That worked reasonably well, although I was employed to help automate our processes, including testing, which contributed significantly to the quality of our software.

When I started working at my current workplace I found similarly detailed scripted test cases. Part of my responsibilities included manual testing, so I thought I could test the way I was familiar with, and how the rest of my colleagues tested—just follow the test cases. It hasn't worked well. We find bugs, for sure, but as I've grown in experience I've found more and more problems with the software that had been there for a long time, through many versions of the software and through many executions of test cases that should have revealed them.

There are at least a few things that I think explain why we, including my past self, failed to identify problems:

Out-of-date test cases. Change happens constantly and we have too many tests with too much detail for our small QA team to keep up with.
Treating test cases like an instruction manual. It's relatively easy for an experienced tester to follow the steps of a scripted test case to the letter, assuming the steps are accurate. That was our standard practice. But it's even easier to miss out on opportunities to reveal bugs if you do that.
Overwhelming detail. Many test cases are so long, verbose, and complicated that it's very easy to miss important details in the steps and expected results, especially when you're under pressure to get the job done quickly.
Unnecessarily specific detail. Often a test case instructs the tester to use a particular element of the UI in a particular way. E.g., "enter a value in the Account text field and click the Validate button at the bottom of the panel." That sort of specificity means the other fields are likely to be ignored, as well as ignoring all the other ways that validation could be triggered. And that problem is in addition to making it hard to keep the test case up to date (because sooner or later, that Validate button is going to move, or be removed entirely).

The last three points have something in particular in common. They all trigger a cognitive phenomenon called inattentional blindness. It's something that we all experience, whether we're aware of it or not¹. A well-known demonstration of the phenomenon comes from a psychological study and you can perform the experiment from the study yourself by watching a video and following the instructions at the start:

If you haven't done so already, I strongly recommend you do the experiment first before you read on—this is something you really only get to experience once, although there are variations of it. Although it's likely something you'll experience again and again in real life.

The study and others like it find that half of the time on average, people fail to notice the unexpected element. They're asked to perform a task and they're focused so intently on it that they fail to perceive something they're looking right at. It's one of the reasons using your phone while driving is so dangerous, even hands-free; if your attention is on the text/app/call it's not on the road.

That kind of inattentional blindness is exactly what can happen when you follow a scripted test. You focus on the steps you have to follow and the results you have to check for and you fail to notice anything else. The software can behave in unexpected ways, but you might miss it if you're only paying attention to what the test case says should happen. Even if you're looking right at the problem. Missing the unexpected becomes more likely the longer you're doing the task and the more anxious you are about completing it quickly.

I'm certainly not the first to draw the link between inattentional blindness and scripted tests. Michael Bolton, for one, has mentioned it a few times on his blog. But it's particularly relevant to me now as I try to improve our testing practices. Whatever changes we make, we need to be aware of the potential for inattentional blindness.

For now it's clear to me that the scripted, highly-detailed test cases we're used to at my workplace are getting in the way of us improving the quality of our software. Part of the solution is a more exploratory approach to testing. One suggestion is that instead of following a test case step-by-step, you could:

glance over the test case; try to discern the task that’s being modeled or the information that’s being sought; then try to fulfill the task without referring to the test case. That puts the tester in command to try the task and stumble over problems in getting to the goal. Since the end user is not going to be following the test case either, those problems are likely to be bugs.

For those interested, there is more information about inattentional blindness on Scholarpedia, including references to the original research publications.

How do you avoid inattentional blindness while you're testing, especially if you have to follow scripted tests?

1. Technically, we're never aware of it. That's the inattentional part.

Originally published on my blog.

Forem: Mark Lapierre

Pros and Cons of Quarantined Tests

Inattentional Blindness and Scripted Tests