Fewer Mocks Better Tests

13 May 2022

[testing]

Fewer mocks are usually a better indication of a higher-value test.

What is a mock?

In testing, a mock is code that is faked out so that the test can control it precisely. The tester will write into the test what a certain piece of code should do. Its behavior is predetermined as a placeholder. When the test is run and the source code is invoked, whatever is mocked will not be real. It will be faked out.

What is real?

Having mocks is very powerful. It allows testing things that would be otherwise difficult to test. But if we attempt the more real-world test, we will end up with a more valuable test. It is more valuable because it tests more things in a real-world way. That way, when we deploy code into the real world, we have a better chance of the real world outcomes matching the outcomes in the test environment.

Tests without mocks are tests that really test.

Can we still get value out of mocks? Sure. And we try...

Attempts to make a unit independent

A unit test identifies a unit of abstraction and seeks to test to that unit boundary and not beyond. A function or class is a good example of such a boundary.

But what if that function calls out to another module? What if that class has another module as a member? The unit has a dependency. But the unit test doesn't seek to test the other module. I just want to test my unit. That other unit is tested elsewhere.

I see (and have made) the argument.

But maybe there are other ways to look at it. If our unit depends on another unit, perhaps that other unit isn't separate, but inside our unit. Mocking out the dependency shows that our unit isn't independent. It relies on another unit. And that relationship is concrete. There is code that ties the two together. That sub-unit does valuable work that is required for this unit to work correctly. The context for the usage of the sub-unit matters. What is its response? What exceptions can it raise? What configuration is required? What side effects are made?

The units work together. They are integrated. Instead of mocking, create an integration test.

In search of 100% code coverage

Sometimes we get side-tracked. Instead of a valuable test mentality of "Am I verifying the outcomes that are important?", we can end up asking "Is this code covered?". We chase the wrong thing. Covered code can indicate good things. But it can also mean little. A strong 100%-coverage culture can encourage good things potentially. But that same strength can easily lead to holding coverage numbers above other better things.

If we have 100% code coverage and most of our test suite is made of mocks, are we 100% safe? No. But then, of course, we're never 100% safe: We don't know what we don't know about the bugs in our code and the entropy in our runtime environment. Nevertheless, our code safety has an inverse relationship to the amount of mocks. Mocks go up, ability to confidently ship verified code goes down. This is because our test outcomes are based on artificial laboratory conditions that will easily not be the same in the real world.

But we want coverage. And some tests are hard. In a real world test, it's difficult to exercise every branch of the test. For instance, how do we create error states? This is where a mock can be really convenient. Mock out the database. Tell that write command to raise an exception. It's easy; it's an available tool; the technology allows it, so let's mock. Exception handling is about the best case for mocking that I can think of.

But once the tool is there, it's easy to overuse. This is especially true if we're good at writing mocked tests. Using these mock libraries is a skill in itself. Once had, every testing problem nail can be solved with the mock hammer solution. Mock counts will increase unless culled. Test suites already high in mock counts will get higher; patterns will be followed.

We need to get our integration test suites in place and be equally good at writing that kind of test.

Code sclerosis

I've written several 100%-covered codebases, and the worst part of mocking for me always come when it was time to change code -- and that happens all the time! You write it once. You refactor it forever until it's deleted.

Mocked code is more resistant to change than unmocked code. This is because a mock exposes the implementation of the source code to the test code. When the source changes, the test must necessarily change. This is white box testing. And there's always more test code. And there are often multiple test cases for each line under test. That will require changing n-number of test cases that invoke that function.

This takes away the incentive to refactor. The code becomes ossified.

Mocks also discourage test-driving development. The red-green-refactor isn't going to work. Of course, if we test-drive, maybe we avoid entangled dependencies and can avoid more mocks. But, again, we need a good integration test suite ready to lead with.

Mock-heavy tests increase the long-term maintenance cost of the entire codebase. Every mocked line or lines that depend on it creates a multiplicity of required updates every time its touched -- forever -- until those mocks are refactored away. Imagine, instead, refactoring the source and only re-running the tests.

When I write mock-heavy code, I feel like I'm writing it twice: once in source, and once again in test, using the special language of the mocking library. It feels silly. Look at them side by side, and they'll look very similar, except the mock setup will take even more code per line, and is multiplied per test case. It's a near-worthless test.

To mock?

Sure, keep mocks as a tool. Become more attuned to mocks as a probable code smell. Avoid them when we can -- not at all costs, but push for avoidance and gain skills and create test environments that enable this.

If we mock too much, we'll us up our energy getting that mock library to work, finally get to the end and sigh. But we'll look at our coverage and see 100%, which is nice. We'll be out of time and out of energy. We'll ship it and hope that the 100% means something valuable.

If we mock much less, we'll have more confidence that our code works in the real world. We'll be able to focus on asserting important outcomes instead of focusing on coverage. We'll maintain more pleasant and flexible codebase into the future.

What do you think the right places to use mocks are? How do you keep them from sprouting like weeds?