Software Quality: Better in Practice than in Theory
Can software quality be accurately measured in practice, or is it best left to the realm of theory? Let's find out.
Join the DZone community and get the full member experience.Join For Free
C. A. R. Hoare wrote an article How Did Software Get So Reliable Without Proof? in 1996 that still sounds contemporary, for the most part.
In the 1980's many believed that programs could not get much bigger unless we started using formal proof methods. The argument was that bugs are fairly common and that each bug has the potential to bring a system down. Therefore the only way to build much larger systems was to rely on formal methods to catch bugs. And yet programs continued to get larger and formal methods never caught on. Hoare asks, "Why have twenty years of pessimistic predictions been falsified?"
Another twenty years later we can ask the same question. Systems have gotten far larger, and formal methods have not become common. Formal methods are used — more on that shortly— but have not become common.
Better in Practice Than in Theory
It's interesting that Hoare was the one to write this paper. He is best known for the quicksort, a sorting algorithm that works better in practice than in theory! Quicksort is commonly used in practice, even though has terrible worst-case efficiency because its average efficiency has optimal asymptotic order , and in practice, it works better than other algorithms with the same asymptotic order.
It is logically possible that the smallest bug could bring down a system. And there have been examples, such as the Mars Climate Orbiter, where a single bug did, in fact, lead to complete failure. But this is rare. Most bugs are inconsequential.
Some will object "How can you be so blasé about bugs? A bug crashed a $300 million probe!" But what is the realistic alternative? Would spending an additional billion dollars on formal software verification have prevented the crash? Possibly, though not certainly, and the same money could send three more missions to Mars. (More along these lines here.)
It's all a matter of economics. Formal verification is extremely tedious and expensive. The expense is worth it in some settings and not in others. The software that runs pacemakers is more critical than the software that runs a video game. For most software development, less formal methods have proved more cost-effective at achieving acceptable quality: code reviews, unit tests, integration testing, etc.
I have some experience with formal software verification, including formal methods software used by NASA. When someone says that software has been formally verified, there's an implicit disclaimer. It's usually the algorithms have been formally verified, not the implementation of those algorithms in software. Also, maybe not all the algorithms have been verified, but say 90%, the remaining 10% being too difficult to verify. In any case, formally verified software can and has failed. Formal verification greatly reduces the probability of encountering a bug, but it does not reduce the probability to zero.
There has been a small resurgence of interest in formal methods since Hoare wrote his paper. And again, it's all about economics. Theorem proving technology has improved over the last 20 years. And software is being used in contexts where the consequences of failure are high. But for most software, the most economical way to achieve acceptable quality is not through theorem proving.
There are also degrees of formality. Full theorem proving is extraordinarily tedious. If I remember correctly, one research group said that they could formally verify about one page of a mathematics textbook per man-week. But there's a continuum between full formality and no formality. For example, you could have formal assurance that your software satisfies certain conditions, even if you can't formally prove that the software is completely correct. Where you want to be along this continuum of formality is again a matter of economics. It depends on the probability and consequences of errors, and the cost of reducing these probabilities.
 The worst-case performance of quicksort is O( n²) but the average performance is O( n log n).
Published at DZone with permission of John Cook, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.