Sorting Through the Wreckage of Last Week's Outages
Cyber-terrorism isn't the biggest culprit in most large-scale outages. It's technical debt. Now is the time to worry less about the bad guys and more about our own software.
Join the DZone community and get the full member experience.Join For Free
Every now and then, when you step back and consider how reliant our society has become on online systems, it can really blow your mind. When you do it because those systems seem to be crashing all around us, it can be downright terrifying.
Such was the case last Wednesday, when United Airlines grounded its flights around the world due to a software glitch, the New York Stock Exchange suspended trading for four hours due to problems with their internal systems, and the Wall Street Journal homepage experienced significant problems, with localized outages around the country.
Now, whether or not you retreated your bunker on Wednesday and started making plans to repopulate the Earth, Colbert did make an important point: our reliance on technology has made our lives and businesses incredibly more efficient, but the fragility of those systems gets transposed onto us the more we rely on them.
To make matters worse, companies are usually hesitant to make the investment of both time and money that’s necessary to completely overhaul their software systems. This typically leads to technical debt – software updates being built on top of the old code (particularly for companies that have been using it for a long time, like airlines) rather than building new, advanced systems from scratch. The end result is a product that gets the job done most of the time, but has significant holes.
That’s the problem pointed out by Zeynep Tufekci, who says that while people were panicking last week about cyber-terrorism possibly playing a role the outages, they were ignoring the much greater risk of relying on outdated and flawed software systems.
Of course, this vulnerability only increases the need for proactive monitoring in order to catch the problems that crop up in these systems before they cause widespread outages and slowness. By running continuous synthetic tests on your software and infrastructure, you can not only catch the major problems like those suffered by United and the NYSE, but also the smaller “micro-outages” that the WSJ experienced.
In the meantime, 2015 continues to be the year of the outage. In addition to the three major ones last week, we’ve also seen other problems with United and rival airlines like American due to third party issues, tech giants like Facebook and Apple – specifically, iTunes – go down for hours at a time, and even Starbucks was forced to close thousands of locations around the country for a night in April due to a problem with their POS systems.
And short of a complete change of mindset on the part of the entire tech industry, it’s probably not going to get better anytime soon. The best we can do is to try and stay on top of it as much as possible.
[This article was written by Craig Lowell]
Published at DZone with permission of Mehdi Daoudi, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.