ITOps: Benchmarks for 2018
ITOps: Benchmarks for 2018
If you dread opening your inbox inundated with alerts, this survey shows you are not alone. In fact, a majority of the industry is being flooded with alerts.
Join the DZone community and get the full member experience.Join For Free
SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.
How Do You Measure Up When An Incident Strikes?
We recently finished analyzing the results of our survey of ITOps professionals from across the industry. The results highlight many of the challenges that are weighing down the industry and keeping IT teams from performing at their optimum potential.
Our goal in developing the survey was to create benchmarks and understand how well engineers in the industry are performing when it comes to critical alerting and alert management of their IT teams.
In many ways, the survey was successful. We received a large number of responses from a number of industries and acquired a strong sense of how ITOps is performing across the country. Unfortunately, we also saw that for all the Chaos Monkeys and strides towards improved response to alerting, there is still a significant lack of progress.
Have You Heard the Buzz?
Automated alerting is an essential component of monitoring. Alerts are what give voice to issues in the stack noted by monitoring tools like Datadog. With automated alerting, teams can immediately receive notification of issues and quickly identify potentially severe issues before they magnify in scope.
But alerts are frequently ineffective. As our survey showed, this lack of effectiveness is because teams are inundated with alerts and become desensitized to them. According to our survey, over 2/3rds of the respondents reported that critical incidents are sent to a team rather than a specific individual.
When alerts are formatted in this fashion, everyone receives the alert as opposed to the specific individual who is best equipped to manage the incident. Over time, this pattern leads to engineers losing sensitivity to the notifications.
How Teams Get Alerted
Email is perfect for communicating information that is not time sensitive. Typically, one expects that whether they respond to an email immediately or an hour later should present no difference. As such, email is fine for daily communication inside a business.
According to our survey, over 80% of IT teams are alerted to critical incidents via email. For critical incidents, email is less than ideal as it allows critical incidents to get buried under a pile of other emails. Email provides no way for critical issues to rise to the top of the pile.
How Many Alerts Was That?
The survey results also indicated that just over 41% of ITOps receive 11 alerts or more per day. Additionally, just over 20% of this group received 40 alerts or more per day. While 40 alerts is clearly more than a team can reasonably manage or should manage, this figure also goes a long way towards explaining why some alerts are missed. If over 40 alerts are sent to you and your team every day, it becomes very hard to prioritize alerts and determine which should be handled first.
The conclusions which one can draw from these numbers are that despite the large number of papers written on improving alert management, many ITOps have not been able to achieve this end. While our survey did show that just shy of 59% receive a manageable number of alerts, 41% are inundated.
Getting to the Front of the Alerting Line
Escalation procedures are typically instituted for alerts to ensure they are forwarded to the right team or team member. If a team member is unable to resolve an issue on their own, they will escalate the matter to receive the assistance of a colleague or to unload the issue all together on the colleague.
Our survey showed that 76.6% of respondents have some sort of escalation procedure in place. At the same time, the most frequent ways to escalate critical responses was through email or SMS. Given the lack of immediacy provided by email or SMS, these tools are unable to impress upon the recipient the need for immediate action.
The Intelligence of Business Intelligence (BI)
Best practices indicate that teams should use analytics to track performance. Perhaps analysts of the industry could be more optimistic if they saw that teams were using analytics to track how well they are performing. If teams employed analytics, they would be better able to review their progress, see where they are failing to meet the grade and then embark on routines to improve. Unfortunately, this is not the case.
When asked whether their team has employed any type of business intelligence to review and analyze their team’s performance, over 70% reported that they did not subscribe to any BI platform. The problem with this result is more than just a missed opportunity. Instead, it is also the loss of opportunity to fundamentally improve the business at many levels.
One of the most important reasons why you need to invest in an effective BI system is because such a system can improve efficiency within your organization and, as a result, increase productivity. Effective business intelligence can also improve the decision-making processes at all levels of management and improve your tactical strategic management.
Yet by foregoing investments in these BI tools, teams are failing to investigate their processes and methods that would improve their team and minimize alert fatigue.
A Call for Smart Alerting
The lesson can be drawn from this is that companies don’t necessarily need more alerting. What they do need is to shift towards more smart alerting.
Smart alerting means that not every bump on the monitoring screen gets tied to an alert. Instead, monitoring output is calibrated so that possibilities are aligned with probabilities and impacts. Alerts also get sent to the teams or individuals that are best able to manage the issue. Additionally, alerts are actionable and come with instructions regarding what the problem might be.
Smart alerting also means that teams use business intelligence tools such as reports and graphs and charts to determine which of their practices have been effective or not effective. Without this insight, teams are often unaware of the subtle points that could really impact their team and provide them with a way to improve their output.
There are a number of insights that can be garnered from our survey. I encourage you to take a moment and download a copy of the study and see what you can learn that will help your team.
Opinions expressed by DZone contributors are their own.