Code Coverage and the Pitfalls of Automated Unit Test Generation

DZone 's Guide to

Code Coverage and the Pitfalls of Automated Unit Test Generation

Let's take some lessons from a group who has worked through some of the potential pitfalls arising during unit testing and see what we can learn from them.

· Performance Zone ·
Free Resource

I recently wrote about how easy it is to fall into the trap of chasing code coverage percentages, which led to a good amount of discussion, so I thought I would take a deeper dive into code coverage problems and solutions. Specifically, the coverage number itself, the value of auto-generated unit tests, and how you can identify unit tests that have problems. And how to keep doing a better job with this.

Counting Coverage

Let's start with the coverage metric itself. Code coverage numbers are often meaningless, or at best misleading. If you "do" happen to have 100% code coverage, what does it even mean? How did you measure it?

There are lots of different ways to measure coverage.

One way to measure code coverage is from a requirements perspective. Do you have a test for each and every requirement? This is a reasonable start... but it doesn't mean that all of the code was tested.

Another way to measure code coverage (don't laugh, I actually hear this in the real world) is by the number of passing tests. Really, I mean it! This is a pretty awful metric and obviously meaningless. Is it worse or better than simply counting how many tests you have? I couldn't say.

Then we come to trying to determine what code was executed. Common coverage metrics include statement coverage, line coverage, branch coverage, decision coverage, multiple condition coverage, or the more comprehensive MC/DC or Modified Condition / Decision Coverage.

The simplest method, of course, is line coverage, but as you have probably seen, tools measure this differently, so the coverage will be different. And executing a line of code doesn't mean you've checked all the different things that can happen in that line of code. That's why safety-critical standards like ISO 26262 for automotive functional safety and DO-178B/C for airborne systems require MC/DC.

Here's a simple code example, assuming x, y, and z are booleans:

If ( (x||y) && z) { doSomethingGood(); } else {doSomethingElse();}

In this case, no matter what my values are, the line has been "covered." Admittedly, this is a sloppy way to code by putting everything on one line, but you see the problem. And people actually write code this way. But let's clean it up a bit.

If ( (x||y) && z) {
} else {
    doSomethingElse(); /* because code should never doSomethingBad() */


A simple glance might lead me to the conclusion that I just need two tests - one that evaluates the entire expression to TRUE and executes  doSomethingGood() (x=true, y=true, z=true) , and another test that evaluates to FALSE and executes  doSomethingElse() (x=false, y=false, z=false) . Line coverage says we're good to go, "Everything was tested."

But wait a minute, there are different ways the main expression can be tested:

Value of x

Value of y

Value of z

Value of decision

















This is a simple example, but it illustrates the point. I need 4 tests here to really cover the code properly, at least if I care about MC/DC coverage. Line coverage would have said 100% when I was half done. I'll leave the longer explanation about the value of MC/DC for another time. The point here is that no matter what method you use to measure coverage, it's important that what you're validating through assertions is meaningful.

Meaningless Auto-Generation

Another trap many fall into is to use an unsophisticated tool to automatically generate unit tests.

Simple test-generation tools create tests that execute code without any assertions. This keeps the tests from being noisy, but all it really means is that your application doesn't crash. Unfortunately, this doesn't tell you if the application is doing what it's supposed to, which is very important.

The next generation of tools work by creating assertions based on any particular values they can automatically capture. However, if the auto-generation creates a ton of assertions, you end up with a ton of noise. There is no middle ground here. You either have something that is easy to maintain but meaningless, or a maintenance nightmare that is of questionable value.

Many tools that do auto-generation of unit tests look valuable at first, because your coverage goes up very quickly. It's in the maintenance that the real problems occur. Often developers will put in extra effort to fine-tune the auto-generated assertions to create what they think is a clean test suite. However, the assertions are brittle and do not adapt as the code changes. This means that developers must perform much of the "auto" generation over again the next time they release. Test suites are meant to be re-used. If you can't re-use them, you're doing something wrong.

This also doesn't cover the scarier idea that in the first run when you have high coverage, the assertions that are in the tests are less meaningful than they should be. Just because something can be asserted, doesn't mean it should be, or that it's even the right thing.

public class ListTest {
    private List<String> list = new ArrayList<>();

    public void testAdd() {

Ideally, the assertion is checking that the code is working properly, and the assertion will fail when the code is working improperly. It's really easy to have a bunch of assertions that do neither, which we'll explore below.

Raw Coverage vs. Meaningful Tests

If you're shooting for a high-coverage number at the expense of a solid, meaningful, clean test suite, you lose value. A well-maintained suite of tests gives you confidence in your code and is even the basis for quickly and safely refactoring. Noisy and/or meaningless tests mean that you can't rely on your test suite, not for refactoring, and not even for release.

What happens when people measure their code, especially against strict standards, is that they find out they're lower than they want to be. And often this ends up with them chasing the coverage number. Let's get the coverage up! And now you can get in dangerous territory by either an unreasonable belief that auto-generation has created meaningful tests, or by creating unit tests by hand that have little meaning and are expensive to maintain.

In the real world, the ongoing costs of maintaining a test suite far outweigh the costs of creating unit test, so it's important that you create good clean unit tests in the beginning. You'll know this because you'll be able to run the tests all the time as part of your continuous integration (CI) process. If you only run the tests at release, it's a sign that the tests are noisier than they should be. And ironically this makes the tests even worse, because they're not being maintained.

Automation isn't bad—in fact, it's necessary, with the complexity and time pressures that are common today. But auto-generation of values is usually more hassle than it's worth. Automation based on expanding values, monitoring real systems, and creating complex frameworks, mocks, and stubs provides more value than mindless creation of assertions.

What Can You Do?


The first step is to measure your current coverage, otherwise you won't know where you're at and if you're getting better. It's important to measure all testing activities when doing this, including unit, functional, manual, etc., and aggregate the coverage properly. This way, you'll be putting your effort into where it has the most value—on code that isn't tested at all, rather than code that is covered by your end-to-end testing but doesn't happen to have a unit test. Parasoft can accurately aggregate code coverage from multiple runs and multiple types of tests to give you an accurate measure of where you're at. For more on this check our whitepaper: Comprehensive Code Coverage: Aggregate Coverage Across Testing Practices.


Tools that create unit test skeletons for you are a good way to start. Make sure those tools connect to common mocking frameworks like Mockito and PowerMock, because real code is complicated and requires stubbing and mocking. But that's not enough—you need to be able to:

  • Create meaningful mocks
  • Expand simple tests with bigger, broader data
  • Monitor a running application

Intelligent Assistance

You can do all of these things manually, but it takes too much time and effort. This is an excellent place to leverage automation—for example, the new Unit Test Assistant available in Parasoft Jtest guides the user through the unit testing process by leveraging existing open source frameworks (Junit, Mockito, PowerMock, etc.) to help the user create, scale, and maintain their unit test suite and provide broader coverage.

Years ago, we dabbled in the full auto-generation space, and we found that the ROI just isn't as good as it should be, so we've doubled down on guided unit test creation assistance that still leverages your brain to make sure the tests have meaningful values, while doing the rest for you. To learn more about Jtest's Unit Test Assistant, see our recent blog: Why People Hate Unit Testing and How to Bring Back the Love.


If coverage is an issue for you, make sure you're measuring it right, and measuring ALL of it from all the tests you run. And as you start expanding your coverage with unit tests, you can leverage guided test creation to quickly create and expand your tests to get meaningful maintainable code coverage. Parasoft Jtest's Unit Test Assistant will create tests that are maintainable as your code grows and changes, so you're not doing the same work over and over again.

automated ,automated testing ,code coverage ,generation ,unit tests

Published at DZone with permission of Arthur Hicken , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}