How to Improve Data Center Incident Reporting

DZone 's Guide to

How to Improve Data Center Incident Reporting

Organization's reputations can take more of a hit when there’s an unexplained outage or attack because people lose trust in the organization.

· Performance Zone ·
Free Resource

Spending on cloud services is expected to grow to $266.4 billion this year, according to Gartner. More markets are depending on the cloud to deliver their technology services, which puts increasing pressure on data centers and their staff to keep things running smoothly.

Everyone notices an outage. What people don’t see is all the work going on behind the scenes to fix those outages. Incident management processes are designed to notify the appropriate stakeholders and get them working on a fix in the shortest time possible. 

And given how dependent we’ve become on technology to deliver vital services, affected users and systems must know about outages so dependent organizations can take steps to manage their processes while it’s being fixed. 

But many companies and data center facilities don’t share incident reports with customers because they fear the financial and reputational loss that can happen with a significant outage. Sharing data center incident reporting information can have a positive effect as it increases industry security initiatives, fosters trust with consumers, and helps tech teams troubleshoot incidents more efficiently. 

That’s where the Data Center Incident Reporting Network (DCIRN) comes in. As an independent, voluntary, confidential reporting program for data centers, it’s helping improve data center operations. The DCIRN aggregates the anonymous reports and shares it with industry professionals to improve the reliability of their infrastructure.

Why Share Your Data Center Incident Information?

Organizations may be hesitant to share data center incident information because of PR concerns. In reality, their reputations can take more of a hit when there’s an unexplained outage or attack because people lose trust in the organization. 

Incident Reporting Helps the Industry Become More Resilient

At its most basic level, sharing your incident information with DCIRN helps the industry understand what types of problems occur, when they occur, and how admins can avoid them in the future. Sharing information helps everyone avoid issues and be more effective if they do happen. 

For example, admins typically receive a patch for individual problems only after reporting them to the manufacturer. The manufacturer only sends the patch if someone asks for it. They could’ve saved everyone time and effort if they were proactive about solving the problem for everyone, not just those that complained. The manufacturer was more concerned about its reputation and NPS score instead of helping its customers avoid problems. 

In this scenario, other data center administrators could have requested the patch immediately after seeing the initial outage in the DCIRN report. They would have ensured no downtime for their facilities, and maybe even forced the manufacturer to push the patch to everyone instead of allowing them to hand it out piecemeal. 

The Public Knows About Outages Immediately

Secondly, with the pervasiveness of technology in today’s society, outages and downtime quickly become public knowledge, usually via social media. Data centers house and handle too many vital systems today, from healthcare systems to smart city infrastructure that manages traffic lights, power consumption, and sewage treatment plants. Not to mention the high-density data used for graphics processing, video games, and media streaming services. 

There’s no sense hiding an outage from anyone, especially the data center admins responsible for remediating it. Outages cost thousands of dollars per minute, so it behooves companies to give their IT teams all the resources they need to fix it. 

Outages Impact a Company’s Bottom Line

Everyone becomes a bit myopic during an incident, as they’re focused on their immediate surroundings or resources. But outages impact more than just the infrastructure and users who can’t connect to the system anymore. 

They can impact all aspects of an organization’s tech stack, from vendor relationships to cloud setups, architectures to integrations. Not to mention how all of these systems contribute to the organization’s business operations and objectives. Customer confidence in the products and services can be negatively affected, which, in turn, affects the organization’s bottom line, churn rates, and revenue targets. 

Keeping outages hidden behind intellectual property and vendor agreements will begin to affect an organization’s ability to keep their technology updated and running smoothly. They will constantly be reacting instead of planning for the future, which ultimately impacts their viability as a business. 

Use Incident Reporting to Address Downtime Before It Happens

Data center administrators should use DCIRN reports and other publicly-available incident reports to improve their system maintenance routines. Admins will stay updated on what’s happening globally and in similarly-provisioned environments. They’ll be able to identify potential issues before they become a problem and can take proactive measures to avoid them.

Data center administrators can use DCIRN’s incident timelines to create baseline health checks for their infrastructure. These guidelines help ensure that data centers are using technology best practices that increase and maintain service levels, reliability, and availability no matter what’s going on. 

Transparent data center incident reporting is an essential step for today’s organizations to take. It demonstrates their adherence to technology best practices and that they’re conscious of maintaining their systems to the latest standards. It encourages more collaboration across market verticals and helps make the data center industry more robust and cohesive. Customers will appreciate this care and attention to detail and will have more trust in how the business handles their data.

data center, data outage, incident management, performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}