The Lurker

What to do when unit tests hurt

posted by ajf on 2019-02-09 at 10:21 pm

In this post about TDD Rhyno van der Sluijs describes a system of two major components, and contrasts the experience of their parallel development:

I split this system into two major components: the rules engine, which performed the calculations, and the export engine, which exported the results to the payroll team. The rules engine was a hit. It was so fluid and fully unit tested. However, the export system barely held together despite also being fully unit tested.

This sounds like a kind of natural experiment that we rarely get the opportunity to think about. But the two components lent themselves to very different testing approaches:

[T]he successful and fluid rules engine had unit tests only testing the main public method. All the hundreds of tests in this area would interact with this single entry method despite their being many other helper methods in the process. However, the export engine was strictly run TDD on every single public method including smaller helper methods in different classes.

A while back, I worked on a system that had a problem a little like this: a query returns two sets of results (exact matches and approximate matches). The core of the task was to take those two lists and concatenate them to produce the final result list.

interface Flattener {
  List<Result> flatten(List<Result> exactMatches, List<Result> approximateMatches);
}

The first unit test for this would be very simple. We can create two input lists, and check that the output contains all the items from the first list in order, followed by all of the items in the second list. If we're feeling particularly conscientious, we can add a few more tests:

If the two input lists are empty, the output is empty.
If the first input list is empty, the output is equal to the second input list.
If the second input list is empty, the output is equal to the first input list.

So far, our tests look a little like the rules engine tests from the blog post: they all just call one method on the Flattener and check for the expected output.

But we had other content that needed to be inserted between the results. Advertisement cells would be placed at regular intervals. We introduced the idea of an "item" here. So far, we had Result Items and Ad Items.

We could have very easily added code to the Flattener implementation, and updated the interface to work with Items. But another developer, recognising that there were several more rules like this still to come, had a better idea. We introduced the idea of an "Injection", which is an object that knows what positions in the result list it would like to place items. As the Flattener inserts each Result Item into the output, it invokes each Injection, providing the number of Result Items inserted so far.

interface Injection {
  List<Item> injectAfterIndex(int index);
}

interface Flattener {
  List<Item> flatten(List<Item> exactMatches, List<Item> approximateMatches);
}

class InjectingFlattener {
  private List<Injection> injections;

  InjectingFlattener(List<Injection> injections) {
    this.injections = injections;
  }

  public List<Item> flatten(List<Item> exactMatches, List<Item> approximateMatches) {
    int count = 0;

    List<Item> output = new ArrayList<>();

    for (Item item: exactMatches) {
      output.add(item);
      count++;
      injectAfter(output, count);
    }

    for (Item item: approximateMatches) {
      output.add(item);
      count++;
      injectAfter(output, count);
    }

    return output;
  }

  private void injectAfter(List<Item> output, int index) {
    for (Injection injection: injections) {
      output.addAll(injection.injectAfterIndex(index));
    }
  }
}

Now at this point, we can insert Ad Items at regular intervals between Result Items, by implementing the Injection interface. But how do we test it?

We don't want to end up in the same place as this project:

The payroll export unit tests on the other hand, were written very strict TDD on ALL public helper methods that were only used by higher public methods. As requirements changed or became clearer, it was cumbersome to change the internal implementation as we would break hundreds of units tests, which meant time and effort to fix. We were reluctant to refactor and clean up our code, leading to an accumulation of technical debt that slowed down development in this area.

The blog post goes on to explain that the Time Calculation process relied on Holiday Calculation, Weekend Calculation, and Overtime Calculation components. A change to the Holiday Calculation logic would break its own unit test, which is good—but it would break Time Calculation tests too.

How did we avoid falling into this trap?

Our InjectingFlattener does not directly depend on the AdInjection class, nor will it need to change when we add new Injections. So our unit test for InjectingFlattener won't break if AdInjection's behaviour changes.

In each InjectingFlattener test, we can supply Injections with whatever behaviour we need to create the scenario we want to check. It's a little fiddly, though, because we have to create a list of injections in each test.

We can see that InjectingFlattener's injectAfter private method has the same signature as Injection's method. We can extract it (and the List<Injection> it uses) into a new Injection implementation, CompositeInjection (named for the Gang of Four design pattern). Now our InjectingFlattener's constructor can receive a single Injection. That makes it that little bit easier to construct an InjectingFlattener instance in each test method, and what were complications in that unit test are untangled to become straightforward test cases in CompositeInjection's tests.

Trying to write a test has driven us towards a better design. (Sorry.)

As we tackled successive requirements, we sometimes added a new implementation of Injection, but other times we needed to make breaking changes to the interfaces.

We needed to inject an Item between the exact matches and the approximate matches—but not if the latter list was empty. We needed to inject a "no results" Item if both lists were empty. Here I proposed that we remove awareness of "exact" and "approximate" from the Flattener interface, and take a List of "Result Tiers" instead. (A Result Tier is itself just a list of Result Items.) Then we created a TierInjection interface:

interface TierInjection {
  List<Item> inject(List<Item> tier, int precedingResultCount);
}

In the end, we ended up with three layers of these injections:

Item injections (described earlier);
Tier injections (one implementation ended up taking over the code from the original flatten method above, while another inserted an item at the start of a tier, since the Item injections always inject after at least one result); and
Query injections (one handling no results, another applying each Tier injection to the corresponding tier of results).

Breaking it down into these pieces meant we had a lot of very small components, and it was very easy to write unit tests for them. But there's no free lunch. At each layer above the Item injections, we ensured there was no concrete knowledge of the layer below. But the concrete stuff has to happen somewhere.

The good news was that we had reduced this to "just wiring": it was a simple matter of constructing Item injections, aggregating them into Tier injections, and building those up through Query injections. But none of our unit tests covered any of the business rules. Those business rules have tricky corner cases, and it would be crazy not to have tests in place to catch regressions.

Enter the InjectionsBusinessRulesTest. (The overall design was a collaborative effort between maybe half a dozen developers, but that appalling test name is entirely my fault, and I accept full responsibility.) It works just like the "rules engine" tests: all of the tests invoked the top-level injection method in the same way, and this test didn't need to change when we changed details of how one injection layer interacted with the next.

Having this test meant that some changes would break two tests: if you change the behaviour of one of the low-level injections, you need to change its test, and you're changing the expected results in InjectionsBusinessRulesTest too. We did find that a little disappointing, but I think the trade off was worth it: most mistakes were caught by (and easily identified due to) the very simple unit tests, and we always found out when a change had unforeseen consequences.

We still had cases where test failures could "overlap", but it could only occur between InjectionsBusinessRulesTest and a unit test.

The conclusion that follows goes against the current norm: unit tests written on helper functions could be seen as a code smell. Do not write unit tests on helper methods as this ties you to an implementation.

I feel like this is the wrong conclusion to draw: it's not the unit test that "ties you to an implementation"—the pain happens because the class itself is tied to its helper methods' implementation.

I don't want to draw an overly-broad conclusion like "helper methods are an anti-pattern", but if a colleague came to me with the Time Calculation problem, I would suggest trying one or both of these ideas (and both of them would eliminate the idea of "helper methods" from this code):

Dependency inversion: instead of Time Calculation knowing how to construct (or find) a Holiday Calculation and a Weekend Calcuation and an Overtime Calculation, provide them to its constructor.
Common interface: like Injection above, find a way to make Time Calculation work with three homogenous Calculations. Perhaps Holiday and Weekend and Overtime Calculations stay as they are, and each is paired with an adapter that pulls out just the data it needs.

When changing your tests hurts, you don't just need to change how you write tests. You need to change the code you're testing.

Related topics: Rants Mindless Link Propagation

All timestamps are Melbourne time.