Advanced Incident Resolution Lifecycle Capabilities

DZone 's Guide to

Advanced Incident Resolution Lifecycle Capabilities

Learn about how you can mitigate the risks and chaos of delivering digital services by working on the major incident resolution lifecycle.

· Performance Zone ·
Free Resource

New capabilities allow IT operations leaders and teams to advance the state of their digital operations management by integrating event management and incident response workflows at scale for business success.

Great speaking with Rachel Obstler, V.P. of Product Management at PagerDuty as they announce a new set of capabilities designed around the major incident resolution lifecycle to help organizations evolve the digital operations of their business. Encompassing the full lifecycle from event management to incident response and learning, the new product workflows enable developers, IT, and business teams to boost their operational maturity, resulting in improved productivity and faster time to resolution, more time for innovation and higher quality experiences for their customers.

PagerDuty's incident operations service focuses on three buckets:

  1. Triage assessment: The ability to prioritize incidents and treat differently based on the severity, as judged by the company, as well as who in the company is responsible for addressing (i.e. DevOps versus Ops in a bi-modal company).

  2. Resolve and remediate: The ability to add stakeholders to stay updated on the state of major incidents without disrupting the team charged with resolution and remediation.

  3. Learning: Automating and facilitating post-mortems by pulling the data from different reporting tools (i.e. Slack and HipChat) and putting them in an aggregated timeline so members of the team can see what happened and when. This enables teams to determine what went wrong and how to prevent the problem in the future, as well as evaluate how well they responded and how can the process be improved moving forward.

As new technologies and digital delivery methods give consumers unprecedented choice, organizations must both focus on flawless customer experience (CX) and continuous innovation to maximize competitiveness and profitability.

According to the PagerDuty State of Digital Operations Report, although 84 percent of IT survey respondents felt confident that their organization is prepared to support digital services, almost 60 percent of those who identified as prepared to support digital services are still experiencing customer-impacting incidents (slowness or downtime) at least once a week.

The increased complexity and associated cognitive load, the surge in the number of tools and growing difficulty in capacity planning stand out as top operations challenges, illustrating the need for DevOps best practices that accelerate the operational maturity of IT organizations. Companies that want to maximize CX and use digital services as a competitive advantage must constantly innovate. Keeping up with this rapid pace of change requires digital operations that integrate people, processes, and tools to quickly identify and resolve incidents and continuously improve to minimize future impact.  

“As digital services increasingly drive business outcomes, the way organizations respond to, resolve, learn from and prevent operational issues is paramount to customer and business success. Organizations must embrace an integrated approach to automated detection, event management and incident resolution to increase not only customer value and trust but also employee engagement, visibility, learning, and productivity,” says Jennifer Tejada, Chief Executive Officer, PagerDuty. “With PagerDuty’s new capabilities that address the modern incident resolution lifecycle, IT leaders and DevOps teams are empowered to proactively address and prevent unexpected, customer-impacting issues faster across applications, services, and networks with new and traditional architectures and data models. These solutions are central to achieving the innovation velocity essential to being competitive in the digital world.”

PagerDuty’s new major incident resolution lifecycle spans event management features, incident prioritization, postmortem tools, and more, empowering organizations to:

  • Drive faster resolution with the right context and powerful automation. Teams can now simplify and automate the incident resolution lifecycle by seamlessly integrating event management at scale and incident response workflows, removing the burdens of administrative mechanics. With the new Incident Priority feature, it is easy to classify major incidents which require a more highly specialized and coordinated response process. And when any type of issue occurs, Custom Incident Actions provide rich in-app extensibility to streamline resolution by automating desired tasks or remediations directly within the incident. 

  • Accelerate learning to be prepared for the next problem. Best-in-class incident management process calls for a postmortem for every major incident. With PagerDuty’s new Postmortem Builder, IT teams can gain a better understanding of how to prevent future incidents by streamlining and automating the postmortem process, institutionalizing a learning culture to improve both systems and every stage of the incident resolution. 

  • Maximize individual effectiveness while ensuring consistency with existing processes. With first-class extensibility to other tools used by the enterprise, PagerDuty drives automatable processes built on DevOps best practices that allow IT teams to focus on higher value parts of incident response. Among other capabilities, the new Jira extension and updated ServiceNow integration help customers centralize information without limiting how people work, breaking down silos between processes and data.

“Incident postmortems are highly valuable exercises for spotting areas where an operation can improve, as well as highlight team successes,” says Len Mitchell, Systems Analyst, Expedia. “However, the process of building the documentation for a postmortem can be time-consuming. The postmortem tool in PagerDuty allows me to complete that process in a fraction of the time. I can easily pull in notes, subscriber notifications, Slack chat threads, and alerts into the interface to build my timeline, and then lead my team through identifying the root cause, what we did well, what we need to work on, and then create action items. Along with the rest of the functionality in PagerDuty, I have a complete suite of tools for effective incident management.”
performance ,monitoring ,incident management ,mitigation

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}