Cost of an Error: Who Pays for Programming Blunders?
Cost of an Error: Who Pays for Programming Blunders?
In this article, we take a look at some of the extreme examples of undetected bugs in code and reflect on the increasing importance of code quality.
Join the DZone community and get the full member experience.Join For Free
Whatever new awaits you, begin it here. In an entirely reimagined Jira.
Modern programmers live in a very special period of time when software is penetrating literally all spheres of human life. Nobody is surprised by software in their fridges, watches, and coffee machines. However, the dependence of people on smart technology is also growing. The inevitable consequence: the reliability of software becomes priority number one. It’s hard to scare someone with a freaked out coffee-maker, although it can bring a lot of harm (liters of boiling coffee flowing on your white marble countertop…). But the thought of growing requirements for the quality of software is really important, which is why I'd like talk about errors in the code that led to a significant waste of time and money.
The aim of these stories is to fight against the idea that defects in programs can be treated as lightly as they were before. Errors in programs now aren’t just incorrectly drawn units in a game, the code is now responsible for people’s health and safety. In this article, I would like to cover several new examples of the necessity to treat code really thoughtfully.
It’s undeniable that complex programs are taking a more active role in our lives: household appliances controlled by a smartphone, gadgets that were hard to imagine 10 years ago, and, of course, more complex software in factories, cars and so on.
Let’s talk about money lost because of errors in the software and the growth of our dependence on code. This topic has been repeatedly discussed (also by me colleague Andrey Karpov “The Big Calculator Gone Crazy“) and every new example proves the same thing: code quality is not something you can ignore.
An Expensive Overline
Mariner 1 satellite was supposed to reach Venus. Launched from Cape Canaveral, the rocket almost immediately changed trajectory, which would have caused it to fall back to earth. To prevent a possible catastrophe, NASA made a decision to start the self-destruction system. Mariner 1 was destroyed 293 seconds after launch.
The inspection committee conducted research, during which they found that the cause of the accident was a programming error that caused the rocket to receive incorrect control signals.
The most detailed and consistent account was that the error was in an hand-transcription of a mathematical symbol in the program specification for the guidance system, in particular, a missing overbar. The error had occurred when a symbol was being transcribed by hand in the specification for the guidance program. The writer missed the superscript bar (or overline) in the formula (by which was meant “the n-th smoothed value of the time derivative of a radius R”).
Since the smoothing function indicated by the bar was left out of the specification for the program, the implementation treated normal minor variations of velocity as if they were serious, causing spurious corrections that sent the rocket off course (source).
The cost of the “missing overbar” – $18 million.
Russian GPS That Drowned
Another vivid example of how millions of dollars can be lost because of a programming error is a relatively recent case. It should seem that in the 21st-century there is everything necessary for writing secure programs, especially when we talk about the space industry. Experienced professionals with excellent education and financing, who have access to the best tools for testing software. All of this didn’t help. December 5, 2010, a carrier rocket, “Proton-M,” with three satellites “Glonass-M” – a Russian equivalent of GPS, crashed in the Pacific Ocean.
After an investigation, the reason for the crash was announced by an official representative of the Prosecutor-General’s Office of the Russian Federation, Alexander Kurennoy: “The investigation has established that the crash was due to the application of a wrong formula, which resulted in putting an additional 1,582 kilograms of liquid oxygen into the acceleration unit’s oxidizer tank. This error led to the carrier rocket’s injection into an open orbit and its subsequent fall into the Pacific Ocean.” (source)
An interesting point is that the document on the need for adjustment of the formula was submitted to the organization’s relevant department but was written off by the engineer as fulfilled. The authorities didn’t verify the way their directives were carried out. All those involved in the accident were convicted of a criminal offense and imposed large fines. Still, that doesn’t compensate the loss of $138 million.
Back in 2009, Manfred Broy, a professor of informatics at Technical University, Munich, and a leading expert on software in cars said: “it [every premium-class automobile] probably contains close to 100 million lines of software code.” (source) It’s been eight years, and even if you aren’t a fan of TopGear, you may have noticed that modern cars have become really intelligent machines.
According to experts, the cost of the software and electronics in the car is about 40% of its price on the market. And this applies to gasoline engines, just think about hybrids and electric cars, where this value is approximately 70%!
When electronic filing becomes more complex than mechanical, it puts more responsibility on software developers. A bug in one of the key systems such as braking is much more dangerous than a torn brake hose.
So here is a question – drive modern, comfortable, and “smart” cars or old school, but simple cars?
Toyota, in general, has a positive reputation, but from time to time the media shows information about the recall of a number of machines. There is already an article on our blog about a software bug in Toyota – “Toyota: 81 514 issues in the code“, but unfortunately, this is not the only case.
In 2005, 160 thousand Toyota Prius hybrids, that manufactured between the end of 2004 and beginning of 2005, were recalled. The problem was that the machine could stop and conk out. It took about 90 minutes to fix the bug on one vehicle - a total of about 240 man-hours.
Chrysler and Volkswagen
In May 2008, Chrysler recalled 24,535 Jeep Commanders manufactured in 2006. The reason was a programming error in the automatic transmission control module. The failure resulted in the uncontrolled cutting-off of the engine.
In June of the same year, Volkswagen recalled approximately 4,000 Passats and 2,500 Tiguans. In this case, the software error caused an increase in the engine rotational rate - the tachometer registration went up when the air conditioner was turned on.
Needless to say, that the process of recalls is associated with enormous financial losses. What is much more dangerous for such huge manufacturers, besides the financial expenses, is the loss of consumer trust. Taking into account the toughness of the competition on the automotive market, such a mistake may have very negative consequences. Restoring their reputation as a reliable manufacturer may be very difficult.
Let’s talk about Tesla Model S,. May 7, 2016, Joshua Brown, who became famous due to his YouTube videos where he praised this car, got into an accident. He was driving a Tesla Model S. Since he was 100% trusting of the software in the car, he trusted the autopilot. The result of this trust is tragic – Joshua died at the scene due to injuries received in the crash.
The accident gained wide publicity. An investigation started. The research showed that apparently, Brown wasn’t really keeping his eyes on the road, and the autopilot got into a situation that wasn’t programmed into the code. There was a truck with a trailer moving in front of Joshua’s Tesla car. The car planned to make a maneuver – to turn left, which required a slower speed. But the Tesla car, going behind, didn’t start slowing down, as the autopilot systems didn’t recognize the object located ahead.
Most likely, it happened because of the bright sun. Immediately after the crash, the explanation for the failure (put forth by Tesla) was that the car likely failed to distinguish the white tractor trailer from the sky. The official report states the following: “braking for crossing path collisions, such as that present in the Florida fatal crash, are outside the expected performance capabilities of the system.” (source). The complete accident report is freely available to the public.
In other words, the autopilot is meant to help the driver (more advanced cruise-control, so to speak), but not to replace the driver's primary functions. Of course, such an excuse from Tesla wasn’t much help. The work on the software continued, but Tesla's Model S wasn't recalled.
Perhaps the examples given in the article seem too epic. Of course, only tragic cases get the public's attention. But I am sure that in every company engaged in software development, there is a story about how just one mistake has caused a lot of problems, albeit local ones.
Is there always someone to blame? Sometimes yes, sometimes no. There is no point in finding the guilty person and chastising him/her. As programs get more complex, they become bigger parts of our lives, which means that the requirements for code reliability are also growing. The price of the typical errors is increasing, and the responsibility for the code quality falls on the shoulders of the developers.
What is the solution? Modernize the development process. Provide assistance for programmers – special programs for detection and fixing of bugs. Complex use of modern techniques significantly decreases the probability that a bug in the code will not be detected at the development stage.
Published at DZone with permission of Anastasia Zubkova . See the original article here.
Opinions expressed by DZone contributors are their own.