Clean Unit Testing
Clean Unit Testing
In this article, I outline several tips that aim to improve the readability, maintainability and the quality of your unit tests.
Join the DZone community and get the full member experience.Join For Free
It's easy to write "unit tests" that use JUnit and some mocking library. They may produce code coverage that keep some stakeholders happy, even though the tests may not even be unit tests and the test may provide questionable value. It can also be very easy to write unit tests that are (in theory) units test but are far more complex than the underlying code and hence just add to the total software entropy.
This particular type of software entropy has the unpleasant characteristic of making it even harder for the underlying software to be restructured or to surface new requirements. It is like the test has a negative value.
Doing unit testing properly is a lot harder than people think. In this article, I outline several tips that aim to improve the readability, maintainability and the quality of your unit tests.
Note: for the code snippets, Spock is used. For those who don't know Spock, consider it a very powerful DSL around JUnit which adds some nice features and cuts down on verbosity.
Reason for Failure
The Unit Test should only fail if there is a problem with the code under test. A unit test for the class DBService should only fail if there is a bug with DBService not if there is a bug with any other class it depends on. Thus, in the unit test for DBService, the only instantiated object should be DBService. Every other object that DBService depends on should be stubbed or mocked.
Otherwise, you are testing code beyond DBService. While you might (incorrectly) think this is more bang for the buck, locating the root cause of problems will just take longer. If the test fails, it could be because there is a problem with any one of multiple classes but you don't know which one. Whereas, if it can only fail because the code under tested is wrong, then you know you will be closer to location of problem much more quickly.
Furthermore, thinking this way will improve the Object Orientated nature of your code. When a test only fails because the code under test is broken, it means the test is only testing the responsibilities of the class and thus makes them clear. If the class's responsibilities aren't clear, or it can't do anything without another class, or the class is so trivial the test is pointless, it prompts the question that there something wrong with the class in terms of the generality of its responsibilities.
The only exception to not mocking or stubbing a dependent class is if you are using a well-known class from the Java library e.g. String. There is not much point stubbing or mocking that. Or, the dependency class is just so simple, for example an immutable POJO, there is also not much value to stubbing or mocking it.
Stubbing and Mocking
The terms mocking and stubbing can often be used interchangeably as if there were the same thing. They are not the same thing. So what's the difference? In summary, if your code under test has a dependency to an object for which it never invokes a method on that object that has side effects, that object should be stubbed.
Whereas, if it has a dependency to an object for which it does invoke methods that have side effects then that should be mocked. Why is this important? Because your test should be checking for different things depending on the types of relationships it has with its dependencies.
Let's say your object under test is BusinessDelegate. BusinessDelegate receives requests to edit BusinessEntities. It performs some simple business logic and then invokes methods on a DBFacade (a facade class in front of a Database). So, the code under test looks like this:
Regarding the BusinessDelegate class, we can see two relationships.
- A read-only relationship with BusinessEntity. The BusinessDelegate calls a few getters() on it and never changes its state or invokes any methods that have side effects.
- A relationship with DBFacade where it asks DBFacade to do something that we assume will have side effects. It is not the responsibility of BusinessDelegate to ensure the update happens, that is DBFacade's job. The responsibility of BusinessDelegate is to ensure the update method is invoked with the correct parameters — only.
So the unit test for BusinessDelegate, BusinessEntity should be stubbed and DBFacade should be mocked. If we were using the Spock testing framework we could see this very clearly
Having a good understanding of stub / mock differentiation improves OO quality dramatically. Instead of just thinking about what the object does, the relationships and dependencies between them get much more focus. It is now possible for unit tests to help enforce design principles that would otherwise just get lost.
Stub and Mock in the Right Place
The curious among you, might be wondering why in the above code sample dbFacade was declared at class level, while businessEntity was declared at method level? Well, the answer is, unit test code is much more readable when it mirrors the code under test. In the actual BusinessDelegate class, the dependency on dbFacade is at the class level and the dependency on BusinessEntity is at the method level.
In the real world when a BusinessDelegate is instantiated a DBFacade dependency will exist, anytime BusinessDelegate is instantiated for a unit test it is therefore ok to have the DBFacade dependency also existing.
Sound reasonable? Hope so. There are two further advantages of doing this:
- A reduction in code verbosity. Even using Spock, unit tests can become verbose. If you move class-level dependencies out of the unit test, you will reduce test code verbosity. If your class has a dependency on four other classes at the class level that minimum four lines of code out of each test.
- Consistency. Developers tend to write unit tests different ways. Fine if they are the only people reading their code; but this is rarely the case. Therefore, the more consistency we have across the tests the easier they are to maintain. So if you read a test you have never read before and at least see variables being stubbed and mocked in specific places for specific reasons, you will find the unit test code easier to read.
Variable Declaration Order
This is a follow on from the last point. Declaring the variables in the right place is a great start, the next thing is to do in the same order they appear in the code. So, if we have something like below.
It is much easier to read the test code if they stubs and mocks are defined in the same order as they appear in the class that is being tested.
And if you thought the last point was pedantic, you'll be glad to know this one also is. The variable names used to represent the stubs and mocks should be the same names that are used in the actual code.
Even better, if you can name the variable the same as the type in the code under test and not lose any business meaning then do that. In the last code sample, the parameter variables are named requestInfo and key and the corresponding stubs have the same names. This is much easier to follow than doing something like this:
Avoid Over Stubbing
Too much stubbing (or mocking) usually means something has gone wrong. Let's consider the Law of Demeter. Imagine some telescopic method call...
It is not enough to stub queryService. Now whatever is returned by getParamResolver() has to be stubbed and that stub has to have getParamMapper() stubbed which then has to have getParamComoparator() stubbed and then compareParams() stubbed. Even with a nice framework like Spock which minimizes verbosity, you will have several lines of test code just stubbing for what is just one line of Java code that is under test!!!
Yuck! Look at what that one line of Java does to our unit test. It gets even worse if you are not using something like Spock. The solution is to avoid telescopic method calling and try to just use direct dependencies.
In this case, just inject the paramComparator directly into our class. Then the code becomes...
and the test code becomes
All of the sudden people should be thanking you for feeling less dizzy.
Bad unit tests have horrible things like asserts all over the place. It can very quickly get nauseating.
Schematic things are always easier to follow. That is the real advantage of the Gherkin syntax. The scenario is set up in the given: always. The when is the scenario and then is where we test what we expect.
Using something like Spock means you have a nice, neat DSL so that the given:, when:,, and then: can all be co-located in the one test method.
Narrow When Wide Then
If a unit test is testing four methods, it is a unit test? Consider the below test:
First up, if Jenkins tells you this failed, you are going to have to root around and figure out what part of the class is wrong. Because the test doesn't focus on a specific method you don't know immediately which method is failing. Second up, if the getName() that is failing, how do you know if getDateOfBirth() and getCountry() are working?
The test stops on the first failure. So when the test fails, you don't even know if you have one method not working or three methods not working. You can go around telling everyone you have 99% code coverage and one test failing but it's not clear what you have missed!
Furthermore, what's is usually easier to fix? A small test failing or a long test failing? No prizes for getting that correct!
A test should check a single interaction with the class you are testing. Now, this doesn't mean you can only have one assert, it means you should have a narrow when and a wide then.
So let's take the narrow when first. Ideally, one line of code only. The one line of code matches the method you are unit testing.
Now we could have had the exact same code coverage, if getName() fails but getCountry() and getDateOfBirth() pass. But, there is a problem with getName()and not getCountry()and getDateOfBirth(). Getting the granularity of a test right is important. It should be ideally one unit test minimum for every non-private method and more when you factor in negative tests etc.
Now, it is perfectly fine to have multiple asserts in a unit test. For example, suppose we had a method that delegated onto other classes.
Consider a method resynceCache() which in its implementation calls two other methods on a cacheService object, clear() and reload().
In this scenario, it would not make sense to have two separate tests. The "when" is the same and if either fails, you know immediately which method you have to look at. Having two separate tests just means twice the effort with little benefit. The subtle thing to get right here is to ensure your asserts are in the right order. They should be in the same order as code execution.
So, clear() is invoked before reload(). If the test fails at clear(), there is not much point going on to check to reload() anyway as the method is broken. If you don't follow the assertion order tip, and assert on reload() first and that is reported as failing, you won't know if clear() which is supposed to happen first even happened. Thinking this way will make help you become a Test Ninja!
The ordering tip for mocking and stubbing, the same applies to assert. Assert in chronological order. It's pedantic but it will make test code much more maintainable.
The parameterization is a very powerful capability that can greatly reduce test code verbosity and rapidly increase branch coverage in code paths. The Unit Test Ninja should be always able to spot when to use it!
An obvious indication that a number of tests could be grouped into one test and parameterized is that they have the same when blocks, except for different input parameters.
For example, consider the below.
As we can see here the when is the same except the input parameters. This is a no-brainer for parameterization.
Immediately we get a 50% reduction in code. We also have made it much easier to add further permutations by just adding another row to the where table.
So, while it may seem very obvious that these two tests should have been the one parameterized test, it is only obvious if the maxim of having a narrow when is adhered to. The narrow "when" coding style makes the exact scenario being tested much easier to see. If a wide when is used with lots of things happening it is not and therefore spotting tests to parameterize is harder.
Usually, the only time not to parameterize a test that has the same syntactic where: code block is when the expectations are completely different structures. For example, expecting an exception in one scenario and an int is another means to two different structures are expected. In such scenarios, it is better not to parameterize. An example of this anti-pattern is mixing a positive and negative test into the one test, you should never do this.
Suppose our addNumbers() method will throw an exception if it receives a float, that's a negative test and should be kept separate. When developers parameterise when they should not the code smell of an if statement will appear in the test.
A when: code block or a then: code block should never contain an if statement. It is a sign that the test permutations have too many logical differences and should be separate tests.
Clean unit testing is an essential part of achieving a maintainable code base. It is paramount if you want to be able to release regularly and quickly. It also helps you enjoy your Software Engineering more!
Opinions expressed by DZone contributors are their own.