Exploring Airline Outages: A Developer’s Perspective
As the airline industry scales to digitally integrate with passengers’ lifestyles, and connect with customers through multiple channels, airlines also have to manage and optimize applications performance to ensure an ideal customer experience with millions of transactions every day.
Join the DZone community and get the full member experience.Join For Free
The transportation industry has seen its fair share of downtime this year. If you’ve been anywhere near the news lately, you have probably seen how severe the effects of a performance outage can be to airlines on a global scale. As the airline industry scales to digitally integrate with passengers’ lifestyles, and connect with customers through multiple channels, airlines also have to manage and optimize applications performance to ensure an ideal customer experience with millions of transactions every day.
Earlier this year, a major domestic airline suffered a massive computer crash. Hundreds of flights were delayed or canceled across the nation. Airline officials said the failure of a router began a domino effect that crashed critical systems. Unfortunately, the airline’s backup systems failed too. As soon as the outage began, ticketing agents had to process passengers manually and could not take additional reservations. The website crashed as well, prompting company officials to estimate a loss of $5 to $10 million in ticket sales. The company communicated quickly via social media to clear confusion and reassure customers of returned service. They explained that technology infrastructure was getting much older, but they had been putting significant money into upgrading it. The plan was to scrap the existing reservation system next year and install brand-new systems in key areas over the course of a few years. However, scrapping systems and rebuilding is not always a sure fix.
Monitoring software ensures that even if a bug slips through, you’ll catch it before your customers do. The airline’s size means they have an extensive technical infrastructure, much of it built in-house by hand. Delays create an immediate negative impact—all the more reason to implement software performance monitoring programs.
Later that year, the same airline had another technology glitch when the airline’s ticketing system did not allow passengers to check in. Backup systems had to be used to check in travelers who did not have a mobile or printed boarding passes. Initially, the airline said they did not know the cause of the technology breakdown that impacted their website, reservation centers and mobile application, describing the problem as “system-wide.” The company said that of the 3,600 flights scheduled that day, 450 experienced delays, which resulted in long lines and backups at major airports from the nation’s capital to the West Coast. This shows how interconnected airline software systems are. When one piece goes down, they all go down, yet another reason performance monitoring in a large distributed environment is vital.
Several travel experts said they believed the crash was due to the company trying to do too much on legacy computer programs. Some airlines are operating with the software they began using in the 1990s. Fleet sizes have grown, and older legacy systems cannot scale. The system breaks down, passengers get frustrated, and the company loses millions of dollars. One industry watcher said that these kinds of glitches are becoming more and more common due to airlines merging and attempting to squeeze as much life as they can out of aging ticketing software.
To keep costs to a minimum, airlines have automated everything they can. But once a system crashes, everything is so interconnected that a host of other airline functions are affected. Then it takes time to go through the long list of things that must be restarted and restored after a crash. Software from the 1990’s cannot be fixed using the processes and tools from that era. Today’s mission-critical business must have an enterprise-grade performance management tool that can ensure optimal uptime.
These glitches were the latest of many in 2015. Earlier that year, two other major domestic airlines suffered multiple performance outages as well. Some experts believe that with about four airlines controlling the vast majority of air traffic in the country, glitches are becoming more common. The computer systems designed for smaller airlines cannot handle the passenger load when the airlines merge.
One obvious way this could have been avoided is to run stress tests against the newly merged hybrid systems and monitor them using an APM solution like AppDynamics to filter errors before going live. After the first glitches, it should have been a red flag to take a step back and assign budget and resources to development teams to take preventive action. An industry like this must also adopt a DevOps culture, so new systems are built to be highly scalable, portable and containerized. As they continue to overhaul their creaky legacy systems, airline DevOps should make application performance management a critical part of their new tech infrastructure.
In today’s market, highly stable software for airlines is critical to functioning efficiently, building customer loyalty and increasing profits. Breakdowns such as those described here are a direct hit to sales and company reputation. Application performance management platforms like AppDynamics helps eliminate problems before they happen. Your ideal solution should monitor not only the application performance but the code itself. It’s critical to monitor everything from end-user transaction monitoring, infrastructure, network performance management, and capacity in the cloud, as well as rapidly identify problems like application delays. In the end, you get a clearer, simpler view of your IT landscape, faster resolution of problems, and better business performance.
Published at DZone with permission of Saba Anees, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.