Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

7 Tips for Creating an Actionable IT On-Call Schedule

DZone's Guide to

7 Tips for Creating an Actionable IT On-Call Schedule

Orlee Berlove discusses why IT on-call is necessary, traditional problems that occur with IT alerting, and how to improve life on-call.

· DevOps Zone
Free Resource

The DevOps Zone is brought to you in partnership with Sonatype Nexus. The Nexus Suite helps scale your DevOps delivery with continuous component intelligence integrated into development tools, including Eclipse, IntelliJ, Jenkins, Bamboo, SonarQube and more. Schedule a demo today

I spent a bit of time on Reddit the other day and thought it was interesting just how many posts were focused on IT on-call and on-call scheduling. Some posts were rants on horrible customers – who hasn’t had some of those? Some actually wrote about positive interactions from being on-call (those were rare posts). However, many engineers in DevOps and IT posted on their trepidation about being on-call. They wondered:

  • What is the best way for my team to create an IT on-call schedule?
  • How do I ensure that I wake up if I am alerted?
  • Should my growing on-call team use an on-call cell phone and hand it off between rotations?
  • How do I manage being on-call and then having to show up at 8 a.m. the next morning?
  • Is it reasonable to expect on-call duty 24/7?

The answers to these questions though don’t need to cause trepidation. While on-call can be anxiety-producing, having the right tools and management goes a long way towards helping to create reasonable expectations and outcomes.

Why IT On-Call Is Necessary for All

If I were to ask you about why on-call is necessary, you might think me a bit of a dunce (go ahead; I’ve been called worse). Isn’t it obvious that on-call is needed to answer customer questions about the product? Duh!

The truth is that answering customer product questions is not the only reason IT on-call exists. In the realm of product development, on-call is a necessary pursuit. You cannot develop products effectively if the product is disconnected from testing its resilience. You cannot know the product’s resilience unless you put it in front of your customers, allow them to test it, and let them call you when it breaks.

Additionally, on-call rotations allow Dev, Ops, and all of your IT team to see how well the product or set-up they have created is working. Many I have spoken to in the DevOps world call this eating your own dog food. Yuck. This statement is meant to illustrate that no one in the IT family can simply create their perceived technical masterpiece and walk away. Instead, they need to take responsibility for their creation. Being part of the on-call family helps ensure this level of responsibility.

Traditional Problems With IT Alerting

In addition to being on-call, there are many additional issues with alerting. Often, issues come in after hours and they lack context. These sorts of problems come in many flavors. For example:

  • A call comes in but the engineer cannot escalate the issue if they need to.
  • There’s a hand-off of a customer problem from regular hours to after-hours on-call and the issue gets muddled because there’s no audit trail on the alert.
  • For overnight on-call, alerts are not sufficiently persistent to get engineers out of bed.
  • Poor management of IT on-call and alerting causes engineer burnout.

A much betteridea is to create an actual IT on-call schedule with a dedicated tool designed to handle effective alerting, auditing and messaging. A tool like OnPage can answer these on-call issues as well as many of the trepidations which engineers face about being on-call.

Improving Life On-Call

Effective management of after-hours on-call needs to be premeditated. That is, the process needs to be thought through and cannot be ad hoc. While most DevOps teams and IT teams have a schedule, they haven’t thought through the whole process. Instead, teams should create on-call schedules that do the following.

1. Enable Escalation

You cannotexpect one person to be on-call 24/7 without having an escalation procedure. Everyone needs a back-up if they cannot attend to a call. People have lives and stuff happens. So, make sure there’s an escalation procedure. 

2. Provide Time Off After Being On-Call Overnight

When a team member has been actively on-call overnight, it is only fair to give that person a reasonable amount of time off before showing up to work again.

3. Make Schedules

Make sure all of your team members have a chance to be on-call. Create scheduling that rotates through the team members equitably.

4. Run Books and Defined Procedures

When your on-call engineer is alerted in the middle of the night, help them out by having run books available to provide solutions to problems that have crept up in the past. This is really helpful when woken up at 2 a.m. and the engineer’s thinking is somewhat clouded.

5. Include Prominent and Persistent Alerts

OnPage provides persistent alerting that will continue for up to 8 hours until answered. Also, there’s no chance of sleeping through the OnPage alerts as they are really designed to wake you up.

6. Ensure Audit Trails to Help With Hand-Offs

Provide an audit trail for alerts so it is clear who on the team is working on an existing issue. Audit trails also provide context to MTTR and help your team keep track of metrics.

7. Make It Based on a Communal App

Ensure your team has an alerting app on their smartphone so there is no need to physically handoff pagers. By ensuring the use of a smartphone application like OnPage, scheduling is much easier as is ensuring response by the right person every time.

Conclusion

While IT on-call might cause trepidation initially, the time spent planning will definitely pay dividends. Again, use a scheduling tool that will allow your team to work effectively together and more like a, well…, team.

The DevOps Zone is brought to you in partnership with Sonatype Nexus. Use the Nexus Suite to automate your software supply chain and ensure you're using the highest quality open source components at every step of the development lifecycle. Get Nexus today

Topics:
devops ,it ,on-call

Published at DZone with permission of Orlee Berlove, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}