Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

"Something is Technically Wrong" #TwitterDown

DZone's Guide to

"Something is Technically Wrong" #TwitterDown

"Something is technically wrong". That’s what Twitter said on Tuesday morning January 19 2016. Millions of Twitter users all over the world were blocked from the social network. How could this outage happen?

· Performance Zone
Free Resource

According to DownDetector, a site that tracks internet sites and mobile apps in real time, users were experiencing the most trouble with Twitter’s website, smartphone app and tablet apps. Also third-partyservices, such as TweetDeck, were intermittently unavailable. It turns out that Twitter experienced an issue ‘related to an internal code change’ that caused the outage for a long time. On Tuesday afternoon, Twitter said they reverted the change, which fixed the issue.

TwitterOutage.png

This application downtime had a huge impact on Twitter’s business. The average hourly cost of a critical application failure is $500,000 - $1 million. Tuesdays outage lasted for more than six hours and the stock price reached a new low, losing 7% and almost $700 million market value.

More importantly: how could this outage happen and why did it take so long to fix the issue? Probably someone wrote or edited the code, deployed it and as a result everything went down. It seems like the problem-finding process at Twitter is a hell of a job. They didn’t know who changed the code, what was changed and how this affected critical business services. They had to start a time-consuming investigation between DevOps teams to find and resolve the problem. The better way to deal with outages is to fully automate the problem-finding process across teams. Every DevOps team should be aware of what’s happening in the full IT stack. Providing business services is and always will be a multiple team effort. To prevent future outages Twitter has to step up their game and take a proactive visual approach for smooth IT operations. They can't wait for the next big incident to happen. 

Let’s hope Twitter will learn from these outages. Eventually you and I, the customers, are suffering the most. We can’t tweet and have to login on Facebook to complain about our problems. ;-)

Topics:
devops ,devops best practices ,outage ,cloud

Published at DZone with permission of Mark Bakker. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}