Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Incident Management Optimized

DZone's Guide to

Incident Management Optimized

Read on to learn how Atlassian partners with PagerDuty, xMatters, and OpsGenie to streamline incident management workflow.

· DevOps Zone
Free Resource

“Automated Testing: The Glue That Holds DevOps Together” to learn about the key role automated testing plays in a DevOps workflow, brought to you in partnership with Sauce Labs.

I just wrapped up a conversation with Steve Goldsmith, General Manager of HipChat at Atlassian.

Last summer, Atlassian bet on the importance of incident communications when it acquired StatusPage and added the first product to its suite to specifically address incident management and communications. Providing status and regularly communicating with customers – especially during incidents – has become a critical part of the software delivery process.

In the world of cloud and DevOps, incidents are becoming more frequent, more complex to resolve, and with a greater impact, as we saw with the recent AWS S3 outage that knocked thousands of web services, including DZone, offline for several hours. As the rate of incidents increases and more of the "infallible" pieces of the internet like Dyn or AWS S3 go down, businesses need to have an incident management plan in place- including people, processes, and technology- so that they can swiftly manage an incident from start to finish.

B2C and B2B customers will judge the companies they do business with by how they handle such incidents. Incidents that are handled smoothly can generate trust and earn customers for life. Incidents that are not handled smoothly can end a customer relationship and ultimately kill a business.

Atlassian already offers pieces of the incident management process: HipChat to organize a "war room" and communicate updates, StatusPage to alert internal and external stakeholders of what's happening, JIRA Service Desk to be the incident system of record, and JIRA Software to track follow on remediation actions so incidents don't get repeated.

Atlassian has worked with customers to understand how they are adjusting to the ever-evolving world of technology and using its products in different aspects of incident management. The consensus is that with resolution timelines shrinking and urgency increasing, legacy service desk and collaboration tools are not sufficient to meet customer needs. The role of rapid and efficient collaboration and communication is critical when you're measuring downtime in minutes and not hours or days.

Atlassian is announcing a new set of strategic integrations with PagerDuty to provide teams with an incident management workflow to respond, organize and remediate when an outage or incident occurs. This launch with PagerDuty joins existing efforts with xMatters and OpsGenie to provide customers best-in-breed integrations across their toolsets.

Why Incident Management?

Regardless of your role, you've noticed more things breaking than usual, and your IT or SRE teams seem to constantly be putting out fires.

There are two reasons that incidents and outages are becoming more regular: The rise of DevOps and shift to rent versus buy infrastructure via cloud services. Companies encourage a focus on speed and have turned to modern software practices like DevOps and the use of third-party cloud services to achieve this, allowing teams to iterate and innovate faster than ever before. But as the speed of deployment and reliance on outside vendors increases, it creates a surge of incidents in its wake. In fact, StatusPage customers opened and resolved nearly 200,000 incidents in 2016 alone for a total of over one million hours of downtime!

During downtime or an outage, efficiency is key to incident teams whose measure of success is time to resolution. They can't be slowed down by context switching across multiple tools or having to re-enter information. Having a well-integrated incident management toolchain is critical. Atlassian has invested in developing best-in-class integrations with PagerDuty, xMatters, and OpsGenie to offer incident management teams a consistent workflow throughout the incident management process.

ChatOps for Incident Management with PagerDuty

There is a new set of strategic integrations with PagerDuty, starting with solving one of the largest issues for incident response teams: communication. Your IT Ops team already lives in HipChat, so it makes perfect sense to get your mission-critical alerts there. PagerDuty is a leader in managing escalations and alerts during incidents– which is complementary to what Atlassian provides, a place for rapid response teams to communicate and collaborate on a solution.

PagerDuty's HipChat Integration sends rich incident notifications right where you're already working. No need to search through a bunch of different apps for context – everything is already right at your fingertips in an easy-to-scan feed in your right sidebar. In addition, this powerful integration allows you to:

  • Use slash commands to fix issues right from HipChat, turning your chat room into a command center. (Why take minutes to solve a problem when you can take just seconds?)

  • Set up your alerts with just a few clicks. In the PagerDuty Extensions Portal, you can map multiple services to individual HipChat rooms, so the right people can see and respond to incident notifications.

  • Sign into PagerDuty from HipChat, ensuring that only users with the necessary permissions can take actions within HipChat. This also logs who took what action, promoting better security and incident analytics.

"The increased velocity of changes made in increasingly complex environments does not need to result in less reliable services. Development and Ops teams implementing best practice incident response processes and tools can minimize customer impact, and even address issues before customers notice. PagerDuty's integration with Atlassian HipChat creates a seamless workflow that organizes and automates incident response teams and activities, so incidents are identified and resolved faster, and ultimately prevented," said Rachel Obstler, Vice President of Product Management at PagerDuty.

Enhanced communication and centralized operations– in other words, the essence of what makes ChatOps– are absolutely crucial to effective incident management. The PagerDuty integration helps developers take ChatOps to the next level. We'll continue to work with PagerDuty on a number of exciting incident management capabilities and look forward to building on our partnership. You can find out more at PagerDuty's site.

xMatters Automates Incident Management Workflow

Atlassian partnered with xMatters to integrate their leading integration-driven collaboration software across HipChat, JIRA Service Desk, and StatusPage– a solution that brings together the tools and people needed to manage an incident. This partnership integrates the right people into your toolchains spanning DevOps, Ops, and service management solutions, and automates communications so you can proactively prevent outages, rapidly engage resolvers and manage major incidents- all within the Atlassian toolset.

See how it works in this video.

Demo an Incident in the OpsGenie Atlassian Playground

Supporting software ecosystems and integrations is at the heart of what OpsGenie does, and together with the Atlassian product suite, it can help your team collaborate like no other. OpsGenie integrates across the Atlassian product suite, enabling a complete incident management workflow. Today, it’s also announcing the release of the OpsGenie Playground for Atlassian, where you can create an incident and walk through the process of resolution all the way to remediation, using all your Atlassian tools.

OpsGenie has ready-to-use integrations for hundreds of monitoring systems. When any of these systems notice an incident, OpsGenie alerts can be created to notify the right people. From these alerts, OpsGenie supports creating and managing JIRA issues automatically, notifications and actions via HipChat, automatic updates to StatusPage, and more.

Learn about the importance of automated testing as part of a healthy DevOps practice, brought to you in partnership with Sauce Labs.

Topics:
devops ,cloud ,incident management ,atlassian

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}