Why WhatsApp Went Down
Why WhatsApp Went Down
WhatsApp's recent outage proved the DevOps adage of fail fast and iterate - however, the media and your users will still take notice.
Join the DZone community and get the full member experience.Join For Free
SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.
WhatsApp's recent outage freaked out users of this globally popular app. But context below from Dave Anderson, a digital experience expert at Dynatrace, shows just how fragile the process of continuously releasing new features is when millions rely on your service.
According to Dave, “You’ve got to feel for WhatsApp today – they’ve got one of the toughest jobs in the world. One in seven people on the planet use the application and expect it to be constantly updated with new features and always performing perfectly. So, when the chat service crashed for UK users yesterday, the company has endured widespread, public criticism.
“It’s hard for consumers to understand just how difficult the job of software development is these days. Amazon is known to release new software updates every 11 seconds and we could assume WhatsApp would be cracking a similar pace – releases and updates in minutes or hours, plus we can see they push a new version of the app to the store every 3-4 days. These are very rapid release cycles to fix bugs, optimize the app and make sure security is good. The process is ongoing (even for a free service!) but then one day, a new update breaks the delivery chain and everything stops. The media picks it up and users get vocal.
“In this case, it’s been reported that WhatsApp was testing a new feature where you pin a conversation to the top of the menu – a back-end feature change/update that’s aimed at satisfying users. But pushing new updates through the development and production cycle is always risky, which is why testing and monitoring how the changes will impact an app’s performance is so important. At the first sign of a problem, the developer team needs to be able to take swift action and roll back and fix, or abandon the change if it’s looking like it will impact the user experience or ultimately bring a service down for millions of users.”
Opinions expressed by DZone contributors are their own.