Database Script Cripples Salesforce Services
CRM market leader suffers their worst outage in a 20 year history. Here's how a small script caused a huge problem.
Join the DZone community and get the full member experience.Join For Free
On Friday, May 17th, Salesforce suffered its largest outage in its nearly twenty years as a provider of CRM services. The original outage was noted as being limited to only current and former users of the Pardot product. Eventually, this led to more Salesforce organizations being taken offline:
"To protect our customers, we have blocked access to all instances that contain affected customers until we can complete the removal of the inadvertent permissions in the affected customer orgs." - Salesforce Trust Status Website
So what exactly happened?
The root cause of the outage began when a database script was deployed which "inadvertently gave users broader data access than intended."
This was first reported by Pardot users, who notified Salesforce that existing permissions and security on corporate data was no longer in place. This not only granted all users access to more data than expected, but give them the ability to change data as well.
For a disgruntled employee, this provided the perfect storm to seek, manipulate and update data with negative intentions.
In order to protect the data of the corporations they serve, Salesforce began to shut down their services.
First, they limited the downtime to current and former Pardot customers. Later, more customers experienced outages. Some corporations ended up sending staff home, since their primary role centered around the services and functionality within the Salesforce solution.
The outage began on Friday, May 17th and continued through Monday, May 20th — at which point the majority of the issues had been resolved.
Recapping The Salesforce Approach
Having once worked on a Salesforce implementation project and worked with several clients expanding their Salesforce instance, I have admired the safeguards that are in place to minimize the impact one tenant can have on the entire service instance. The governors limit how much load a process can put on the overall system, which protects the performance of other customers running in the same space.
From an Apex (their Java-like programming language) perspective, there is a requirement for code coverage that must exist, before any code can be pushed onto a production instance. Their hope is that strong code coverage leads to better coding. I am not sure this is a valid conclusion, but I do appreciate the requirement for a high level of code coverage.
From a deployment perspective, Salesforce tries to make sure the customer knows what is happening and what is not happening when deploying updates from one instance to another. They continue to make improvements to help automate the process as well to facilitate CI/CD functionality.
How Does This Happen?
With all of those safeguards in place to keep customers in line, I have to wonder how a bad database script was able to find its way onto not only one Salesforce server instance, but several. I have these questions:
What is the code review or pull-request (PR) process in place for looking over code before it is executed in a production environment?
What kind of testing was completed in non-production environments to assure that the difference between the before snapshot of the database and the after snapshot match the expectations that were changed by the script?
Was there not any QA testing completed after the script ran? Certainly, one would guess that one of those organizations would have permissions setup. A simple test to see if "a limited user still only has access to limited data" would have failed and exposed this issue well before any production database was updated.
A CRM system is not an ERP system or even a financial system. However, most Salesforce organizations I have encountered do maintain some financial, proprietary, and confidential data in their CRM space. It is evident that such controls are not in place to protect the data and the integrity of the customers using the Salesforce service.
I wonder what the fallout will be from this outage.
We all know about the Equifax disaster. For those interested, I wrote the following articles about that security breach:
The Salesforce outage could be similar in impact to Equifax. At this point, we don't know how much confidential data was taken in the period of time when unexpected access was granted. We also do not know who now has access to this data.
Just like Equifax, simply following industry best practices would have avoided this situation.
Salesforce is not a cheap product. Part of the "sell" for the Salesforce offering is that they offer a service that you subscribe to — not having to worry about things like security, databases, applications, and infrastructure. Prior to this outage, there was a strong record that Salesforce knew what they were doing.
However, this kind of outage makes one wonder, what just might happen next. And when.
Have a really great day!
Opinions expressed by DZone contributors are their own.