Fix those flaky tests. Now!

· 6 min read

In one of the more mature repositories I work with, we have a test suite that's been around for a while. It boasts over 2000 tests and has lived through numerous changes. Over time, a tiny fraction of these tests have become flaky—failing randomly, sometimes passing and sometimes not. The trouble with flaky tests is that they're often unrelated to the code you're working on and far too easy to ignore. With looming deadlines and a focus on shipping code quickly, a simple re-run of the Continuous Integration (CI) pipeline usually does the trick. So how can you justify investing time and energy into fixing something seemingly so disconnected from your immediate goals?

It's fair to consider flaky tests a form of technical debt. And as with all tech debt, it's crucial to crunch the numbers and understand the costs involved. This gives us a clear sense of urgency to fix it. So, let's break it down:

Imagine you’ve got a big test suite with hundreds or even thousands of tests. If just 10 of them are moderately flaky, failing once every 20 runs, it will have a devastating effect on your CI pipeline: Each flaky test has a 95% chance of passing (19⁄20), but with 10 of them, the odds that they all pass at once drop to just 60% (0.9510). That means 40% of your CI runs are going to fail for no good reason.

So, what do you do when a CI run fails because of a flaky test? You rerun it, of course. But here’s the catch: you don’t know if it was a flaky test or an actual bug before, so you have to check the results, figure out if it’s safe to ignore, and rerun everything, hoping it works this time (good luck 🤞).

This isn’t just annoying—it’s a productivity killer. Every time you push code and the CI fails, you have to stop what you’re doing, check the results, and rerun the tests. Let’s be moderate and say that takes 10 minutes including the costly context switch. If you push code five times a day and two runs fail, that’s 20 minutes gone. If one rerun fails again, add another 5 minutes. That’s 25 minutes wasted every day, per person.

Multiply that by everyone on your team, and suddenly a flaky test suite is costing you hours of work every day.

But it's not just about time—it's also about trust. When the test suite constantly fails, people become numb to those failures, and your tests lose credibility. This "alert fatigue" leads to more problems: with a large test suite, you typically don't run all tests locally every time. Instead, you run the few dozen tests related to your changes and then push often to remote, triggering the CI pipeline. This practice helps you catch any failures on the global scope early which is crucial for maintaining a tight feedback loop between what you changed and its effects. The closer this feedback loop, the quicker you can connect failure to culprit and fix the problem. Ignoring CI alerts because you assume they're just flaky tests robs you of the quick feedback loop's benefits. This oversight leads to longer debugging times when real issues arise in your code.

Well, there are obviously enough economical and psychological reasons to take action against the problem. The good news is that fixing a flaky test is usually straightforward. In my experience, most can be resolved within minutes. So instead of wasting 25 minutes of each developer's time every day and crushing their souls with annoying interruptions, it's far more efficient to fix these tests as soon as possible.

A Simple Way to Fix It

The best way to handle flaky tests? Fix them right now. Don’t put it off or create a ticket for later—just get it done. You’ve got two options:

  1. Create a quick PR to fix the flaky test. Most flaky issues are usually just a few lines of code to fix, so it’s not a big lift to get this reviewed and merged.
  2. Fix it directly in your current branch if your current work is almost done. This means less overhead since you’re not making an extra PR, and you get the benefits of a stable suite right away. The downside is that others won’t benefit from the fix until you merge, unless they cherry-pick the changes.

Which option is best? It depends on your team. For smaller teams with fewer changes, fixing it directly usually works fine. For bigger teams, a separate PR might be a more suitable workflow.

How to Fix Flaky Tests

1. Catch Them in the Act

Not sure why, but I had somehow convinced myself that fixing flaky tests was harder than it actually is. They’re those unpredictably behaving weirdos, and we all know the pain of dealing with seemingly random problems in computing, right? To fix a flaky test, you need to catch it in the act of failing. If a test only fails sometimes, it can be elusive. But remember, it’s not truly random—it’s stochastic. Your flaky test fails with a certain, predictable probability. Simply run the test enough times to catch it failing. No need to mash your keyboard like a maniac—automate the reruns so it keeps going until the failure shows up.

// Run the test 100 times
Array(100)
  .fill(null)
  .forEach(
    test("flaky test", () => {
      // your test code here
    }),
  );

Make sure you log everything you need so you can diagnose the issue when it finally fails.

2. Look for Common Causes

In my experience, flaky tests usually fall into one of these buckets: timing issues, random inputs, leaking test isolation and external dependencies.

Timing Issues

These occur when tests fail to properly handle asynchronous operations, leading to race conditions. Use appropriate waits or mocks to control timing in your tests. Ensure there are no un-awaited side effects in your code that need to be asserted in the test.

HTTP endpoints can sometimes have side effects that aren't awaited before the response is sent (e.g., email delivery). Testing these side effects can become flaky due to timing issues. One solution is to use a queue to manage side effects. Adding a job to a queue is quick, can be awaited before returning the response, and can be asserted without race conditions afterward.

Random Inputs

Tests using random data can lead to inconsistent results. While randomized data for test fixtures can be valuable, and there are fabulous libraries for this purpose, it's crucial to ensure reliability. Use fixed seeds for reproducible results, or opt for fixed values to make your tests more stable and predictable.

Leaking Test Isolation

If the flaky test runs fine in isolation but fails on CI, you're likely dealing with a test isolation issue. Run it alongside surrounding tests and carefully verify that mocks, databases, and other test fixtures are properly reset before each test.

External Dependencies

Network requests, APIs, or other external services can cause test variability. Use mocks or stubs to isolate your tests from these dependencies. Mock Service Worker and Nock are great libraries for that purpose.

Wrap-Up

Flaky tests aren’t just an occasional annoyance—they’re a drain on your team’s productivity and trust in your CI process. Fix them right away, and you’ll save time, reduce noise, and keep your test suite in good shape. The effort to fix them is small compared to the ongoing cost of doing nothing. So, don’t wait—fix those flaky tests now!