Major Incident Responses Lack Consistency
Major Incident Responses Lack Consistency
Here we take a look at the root of the consistency problem in major incident response to see how DevOps and automation can help.
Join the DZone community and get the full member experience.Join For Free
The DevOps Zone is brought to you in partnership with xMatters. xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read this best practices guide and learn 4 steps that are critical to DevSecOps success.
While DevOps support practices necessitate high levels of both structure and guidance, teams in charge of major incident responses aren’t often given the structure or the autonomy to make their own quick decisions, which leads to diminished consistency.
According to a 2017 DevOps survey from xMatters and Atlassian, most modern companies do not have consistent responses to major incidents.
In fact, 48 percent say their tools, process, and steps vary from incident to incident, while 29 percent say duplicate tickets are created while the incident is being resolved. Finally, 23 percent say tickets are routed without proper assignments and must often be rerouted.
This is a problem that can easily create inconsistencies and negative variations in incident response teams.
The Root of the Sub-Par Consistency Problem
One of the major factors in this lack of consistency is minimal autonomy. According to the same DevOps survey, 50 percent of respondents wait for the operations center to declare a major incident before responding, while 34 percent say waiting for subject matter experts delays incident resolution.
This is the result of many factors. Many experts, however, believe it owes primarily to the fact that the “fail hard and fail fast” motto associated with DevOps is unacceptable when applied to customer downtime.
It may also have quite a lot to do with the fact that the inherent chaos of a major incident prevents experimentation.
How to Create More Consistency During Incident Response
While the inconsistency in the modern incident response model is unacceptable, making a plan to fix it is somewhat more complex.
Currently, I believe that the best approach would be to execute experimentation during drills and to initiate internal test responses. Over time, this would lead to better, more agile responses during actual major incidents.
Collaboration software can help support this experimentation and troubleshoot internal test responses.
Automation can also play a large role in increasing the consistency of incident response. Workflow-based systems designed to offer full visibility and ongoing feeds can work to provide information needed to make consistent decisions and develop a predictable process for containment and remediation.
In this way, automation actually stands out as one of the most efficient and time-efficient measures for incident management and can help thin-stretched teams develop more consistency throughout their incident management processes.
Automation and workflow-based systems also solve one of the major issues outlined in the xMatters-Atlassian Report, which is that tools, processes, and steps vary from incident to incident. When automation is adopted, processes become more streamlined and predictable, as do tools and steps.
It’s also essential for incident response teams to develop predictive methods that declare major incidents earlier, which allows teams to respond faster and access subject matter experts more quickly.
IT alerting platforms and system integrations are both essential for a more predictive system and advanced responses.
More Consistent Incident Response Starts Here
While cultivating a consistent incident response schedule can be complicated, simple tactics like implementing automation, using predictive technology and promoting experimentation and internal test responses can go a long way toward restoring autonomy, creating consistent steps and processes and promoting better incident response experiences for both teams and customers across the board.
Published at DZone with permission of Dan Goldberg , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.