Effective Code Coverage (and Metrics in General)
This is the best definition of code metrics I have found so far1:
Code metrics is a set of software measures that provide developers better insight into the code they are developing. By taking advantage of code metrics, developers can understand which types and/or methods should be reworked or more thoroughly tested. Development teams can identify potential risks, understand the current state of a project, and track progress during software development.
The key here is, in my opinion, that the purpose of code metrics is to help developers understand where their code needs to be improved. Unfortunately, code metrics, like any tool, can be easily misused, doing more harm than good. Let’s take my favorite code metric, code coverage, as an example.
As I mentioned many times in this blog, I use code coverage tools to help me discover which parts of my code need more testing. I don’t care if my code coverage is 50, 80 or 90%. What I care about is that the most complex areas of my code are properly tested. As useful as code coverage is, it can also cause problems if misused or misunderstood. Here are some examples:
Management mandates to have X% of code coverage.
This is actually not so uncommon, based on stories from friends and my personal experience years ago in consulting. Upper management, without a clue about the health of the codebase, mandates that developers should reach X% of code coverage. Period. (It gets even better when there is a deadline for reaching such number!)
In many cases, when the codebase was not designed with testability in mind, I’ve seen developers writing a lot of test cases, without any assertions! All the tests pass, and the code coverage goal is met. The result?
- developers wasted time and effort without adding any improvement to the codebase
- management got a false impression that the code base is in a somewhat healthy state
Developers follow numbers blindly, without thinking about their meaning.
It is too easy (in many cases, comfortable) for developers (including QA) to fall into this trap (I’m guilty of this too!) Common examples include:
- Tests suites containing only unit tests. We, developers, tend to avoid functional and integration tests because they usually run slower and need more work to write and set up. Unfortunately, unit tests only work based on developers’ assumptions, but do not test that the actual application, as a whole, works as expected in front of the user.
- The wrong perception that “more is better.” Probably this is normal human behavior: 50% code coverage must better than 10%, or even 0%. Actually, it all depends on the quality and types of tests. For example, having 50% of code coverage with tests that do not have any assertions is worse than having 0%, because it creates the illusion of a healthier code base. Another good example is my recent experience of having 100% code coverage…and faulty software!
- A single metric to rule them all. Having a high code coverage number does not imply that our test suite exercises all the possible testing scenarios: potential race conditions and all UI-interaction scenarios are good examples. Working towards achieving 100% code coverage can prevent us from seeing important areas that need to be tested. In addition, we should not focus on a single tool to determine the state of our code. Code coverage is just one of the many code metrics that we can (or must?) use to determine the health of our codebases.
Useless metrics reports.
Even though we have excellent code coverage tools like Cobertura, EMMA (both open source) and Clover (commercial,) I have seen custom code coverage tools built in-house. There are many reasons for this, like NIH or lack of support for a specific language (e.g. Scala) in existing tools. What surprises me the most is the amount of time and effort put into building a custom code coverage tool that, at the end, does not allow developers to drill-down to the most basic unit: a single line of code.
A code coverage tool that reports that method “X” in class “Y” has 10% code coverage, without providing a way to go one or more levels deeper, makes it impossible to figure out where we should add or improve our tests. In my opinion, this is not only useless, but harmful as well: pointing out a problem without offering a way to find a solution is not constructive, and it can start a blaming game within an organization.
In conclusion, code coverage (and metrics in general) not only need to provide accurate measurements in order to be useful. They also need to empower developers with a way to analyze those results to the minimum detail, to help them decide where and how a codebase could be improved. At the same time, developers need to understand that targeting a fixed number blindly is not only a waste of time and effort, it can also create unrealistic perceptions about how healthy a codebase is.
What do you think? Feedback is always welcome :)
1 MSDN. Code Metrics Values.