What your test suite won't catch

DZone 's Guide to

What your test suite won't catch

· Agile Zone ·
Free Resource

Automated test suites are a wonderful tool: they allow incremental development by catching regressions in functionality between one user story implementation and the next. They remove fear from deployments because the code is first exercised on development and CI machines, keeping bugs out of production.

However there is a diffused misconception of trying to write a test for every aspect of the code we want to ensure. While this is possible for some non-functional requirements, it's still in its infancy (or may not even be possible) for others.


Performance testing is one of the areas where automation of the load-generating process is very useful, provided there are some adjustments in executing the tests:

  • the environment they hit should have the same resources as production, or a vertical subset of it (e.g. at least a web server and a database separated, like in production). This may be costly especially if you have a few servers, whereas if they are already dozens adding a couple shouldn't be a problem.
  • The load must be generated from a separated machine.
  • The databases present on the environment should be of a comparable size to production, but considering the reduced capacity of these machines.
Judging the results of performance tests is not easy, but you can set up a graph of the last runs and a threshold you do not want to go over (such as 10 seconds to perform a payment or to open a new thread in a forum.)

Maintainability (and extensibility)

Testing a design's maintainability (against what changes, by the way?) automatically becomes already more difficult than its performance. By definition, maintenance is future cost and so it cannot be measured; we cna only hope to find some numbers or smells that correlate with high or low maintenance cost in our experience.

Metrics generation is a solved problem, but as Gojko would say coverage, indexes, duplication measurements and such are negative indicators: when they are red they tell you something is wrong. However, when they are green they cannot tell you anything about how good your code is.

Take for example unit tests: it is known they give a better feedback to developers about their code quality with respect to end-to-end ones; being able to write a unit tests with a few lines of code for every part of the application means you're avoiding singletons, global state and hidden dependencies; that the API of the single objects you have produced can be used in isolation and that objects do not chat with each other too much.

However, the presence of a battery of unit tests just means that you have avoided these problems, not that there aren't others. It is still possible to produce a perfectly unit-tested design of classes 100-line long that is then torn apart when it comes the time to implement the next requirement. Domain knowledge like the probability of changes and the reflection of real world concepts in the code proper of DDD are kings here.


While performance can be tested and unit tests can act as a first round of weapons against poor maintainability, we are at a loss when it comes to guarantee security requirements.

How do you test that multiple payments cannot be performed with the same money? Or that HTML code cannot be injected in your forms? Like for functionalty-oriented tests, a particular test can only guarantee that a certain attack does not work in a specific form. It cannot guarantee there is not a similar attack, and the set of attacks you can tests are the one known to you. The difference with functional tests is that functionality issues manifests as bugs that can be fixed, while security issues can cost you much more.

The only two ways I known of improving an application's security is to study common attacks and the pattern used to overcome them; and to have an audit by external security experts whose devote their career to the matter.


Concurrency bugs are similar to security issues in how difficult is to find them. In the former case, they manifest themselves when many different processes are running on the same data. A successful run of your test suite, even when performing multiple transactions at the same time, does not guarantee bugs won't manifest in production where there is much more data to deal with.

If you have a race condition in your code, at best the test suite will be able to fail intermittently, instilling the doubt in you and raising the time-to-production. At worst, the tests won't even expose the problem and just keep passing because the probability of the race condition to manifest with just a few hundred transaction is too low; or they are too spread in time to be a significant load. In fact, almost all of the work in resolving concurrency bugs is in reproducing them.


Test suites are one of the tools we have to improve the quality of software while we're building it; they are particularly fitting for checking functional requirements and some other properties such as performance and some forms of maintainability. However, there are other critical properties that may be important in your project and that your test suite cannot help you in fixing. However, we still can put in place processes that give us feedback on the matter, such as security audits, code review and pair programming for maintainability, and higher-level models than code for concurrency issues. Tests are a tool, not an end.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}