As an IT management consultant, probably the most frequent question I hear is some variant of “how can we get our defect count down?” Developers may want this as a matter of professional pride, but it’s the managers and project managers that truly burn to improve on this metric. Our software does thousands of undesirable things in production, and we’d like to get that down to hundreds.
Almost invariably, they’re looking for a percentage reduction, presumably because there is some sort of performance incentive based on the defect count metric. And so they want strategies for reducing defects by some percentage, in the same way that the president of the United States might challenge his cabinet to trim 2% of the unemployment percentage in the coming years. The trouble is, though, that this attitude toward defects is actually part of the problem.
The Right Attitude Towards Defects
The president sets a goal of reducing unemployment, but not of eliminating it. Why is that? Well, because having nobody in the country unemployed is simply impossible outside of a planned economy – people will quit and take time off between jobs or get laid off and have to spend time searching for new ones. Some unemployment is inevitable.
Management, particularly in traditional, ‘waterfall’ shops, tends to view defects in the same light. We clearly can’t avoid defects, but if we worked really hard, we could reduce them by half. This attitude is a core part of the problem.
It’s often met with initial skepticism, but what I tell these clients is that they should shoot for having no escaped defects (defects that make it to production, as opposed to ones that are caught by the team during testing). In other words, don’t shoot for a 20% or 50% reduction – shoot for not having defects.
It’s not that shooting for 100% will stretch teams further than shooting for 20% or 50%. There’s no psychological gimmickry to it. Instead, it’s about ceasing to view defects as “just part of writing software.” Defects are not inevitable, and coming to view them as preventable mistakes rather than facts of life is important because it leads to a reaction of “oh, wow, a defect – that’s bad, let’s figure out how that happened and fix it” instead of a reaction of “yeah, defects, what are you gonna do?”
When teams realize and accept this, they turn an important corner on the road to defect reduction.
What Won’t Help
Once the mission is properly set to one of defect elimination, it’s important to understand what either won’t help at all or what will help only superficially. And this set includes a lot of the familiar levers that dev managers like to pull.
First and probably most critical to understand is that the core cause of defects is NOT developers not trying hard enough or taking care. In other words, it’s not as though a developer is sitting at his desk and thinking, “I could make this code I’m writing defect free, but, meh, I don’t feel like it because I want to go home.” It is precisely for this reason that exhortations for developers to work harder or to be more careful won’t work. They already are, assuming they aren’t overworked or unhappy with their jobs, and if those things are true, asking for more won’t work anyway.
And, speaking of overworked, increasing workload in a push to get defect free will backfire. When people are forced to work long hours, the work becomes boring and “grueling and boring” is a breeding ground for mistakes – not a fix for them. Resist the urge to make large, effort-intensive quality pushes. That solution should seem too easy, and, in fact, it is.
Finally, resist any impulse to forgo the carrot in favor of the stick and threaten developers and teams with consequences for defects. This is a desperate gambit, and, simply put, it never works. If developers’ jobs depend on not introducing defects, they will find a way to succeed in not introducing defects, even if it means not shipping software, cutting scope, or transferring to other teams/projects. The road to quality isn’t lined by fear.
Understand Superficial Solutions
Once managers understand that eliminating defects is possible and that draconian measures will be counterproductive, the next danger is a tendency to seize on the superficial. Unlike the ideas in the last section, these won’t be actively detrimental, but the realized gains will be limited.
The first thing that everyone seems to seize on is mandating unit test coverage, since this, presumably, forces the developers to write automated tests, which, in turn, catch issues. The trouble here is that high coverage doesn’t actually mean that the tests are effective, nor does it cover all possible defect scenarios. Hiring or logging additional QA hours will be of limited efficacy for similar reasons.
Another thing folks seem to love is the “bug bash” concept, wherein the team takes a break from delivering features and does their best to break the software and then repair the breaks. While this certainly helps in the short term, it doesn’t actually change anything about the development or testing process, so gains will be limited.
And finally, coding standards to be enforced at code review certainly don’t hurt anything, but they are also not a game changer. To the chagrin of managers everywhere, “here are all the mistakes one could make, so don’t make them” doesn’t arise from the past experience of the tenured developers on the team.
Change the Game
So what does it take to put a serious dent into defect counts and to fundamentally alter the organization’s views about defects? The answers here are more philosophical.
The first consideration is to get integration to be continuous and to make deployments to test and production environments trivial. Defects hide and fester in the speculative world between written code and the environment in which it will eventually be run. If, on the other hand, developers see the effects their code will have on production immediately, the defect count will plummet.
Part and parcel with this tight feedback loop strategy is to have an automated regression and problem detection suite. Notice that I’m not talking about test coverage or even unit tests, but about a broader concept. Your suite will include these things, but it might also include smoke/performance tests or tests to see if resources are starved. The idea is to have automated detection for things that could go wrong: regressions, integration mistakes, performance issues, etc. These will allow you to discover defects instead of customers.
And, finally, on the code side, you need to reduce or eliminate error prone practices and parts of the code. Is there a file that’s constantly being merged and could lead to errors? Do your developers copy, paste, and tweak? Are there config files that require a lot of careful, confusing attention to detail? Recognize these mistake-inviters for what they are and eliminate them.
But here’s the thing – I can’t possibly enumerate all of the tools in your arsenal. These are some of my most tried and true strategies, but you’ll have to figure what works for you. The key is to recognize that defects are not inevitable and go from there.