Late last week, Salesforce.com, the wildly successful customer success platform, experienced 12 hours of downtime and 3 and a half hours of data loss. If you haven’t already seen on social media, there is a hashtag appropriately named #NA14 outage that talks about the all-day outage. This is big news for any organization, but even more pronounced given that as of early 2016, Salesforce was one of the most highly valued American cloud computing companies with a market capitalization of ~$45 billion.
It is unfortunate to highlight these types of events, but there is much to learn. Thousands of companies that have hundreds or thousands of users each were abruptly denied access to their cloud applications for marketing, sales, customer service, messaging and analytics and all their interactions with end-prospects and end-customers. This means everything was down, and for businesses that operate on Salesforce’s platform, this was a major inconvenience.
Salesforce’s support team said the problems occurred after it performed a “successful site switch” from its primary data center after power supply problems caused nearly two hours of downtime. Sources claimed the initial failure occurred at Salesforce’s data center in Herndon, VA and the secondary failure occurred in its Washington DC facility. “Other instances also took a hit around the same time, including NA 11 and 12, as well as the sandboxes CS 9, 10, and 11.”
While Salesforce is still trying to figure out exactly what went wrong, for now, it says “a database failure on the NA14 instance, which introduced a file integrity issue in the NA14 database” was the culprit. There was a backup but it looks like that was incomplete!
We don’t realize the impact until it happens to us, but data loss is a real event and one that happens all the time. The culprits are varied: operational issues, disasters, human error, and the list goes on. Enterprises understand this issue, and today the IT spending for backup and recovery is north of ~$12B+ (backup software, backup hardware, et al).
The reason we need to keep our pulse on backup and recovery for databases more now than ever before is that cloud scale is driving people towards distributed systems and Platform 3 applications. These can’t survive or work well with backup approaches of yesteryear. That is why organizations are increasingly turning to scale-out eventually consistent databases that require a new way of protecting data against data loss. In this new world “bad” data replicates as fast as “good” data, making enterprises more vulnerable due to replication. And given that organizations must be able to provide governance over their systems as data must be secured and mapped into the correct database layer, this creates a new opportunity for data protection for distributed databases. This is something that we at Datos IO are fanatic about, and have been working to build the industry-first product for cloud-native applications deployed on distributed databases.
While this news from Salesforce wasn’t something they wanted, the silver lining in all of this is that they are now revisiting their backup strategy. This also serves as a gentle reminder that disasters happen all the time, and data must be recoverable over its lifetime.
I hope all enterprises will take a critical look at their backup and recovery processes and associated technology to ensure they can deliver great customer experience their end-customers expect.