DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Overcoming Alert Fatigue: A Team's Journey to Effective Incident Response
  • How to Improve Team Performance: 11 Effective Ways
  • Project Oxygen: Breathing New Life into Teams and Organizations
  • 8 Metrics for Rapidly Scaling Dev Teams

Trending

  • Measuring the Impact of AI on Software Engineering Productivity
  • Comparing SaaS vs. PaaS for Kafka and Flink Data Streaming
  • How to Practice TDD With Kotlin
  • Scalability 101: How to Build, Measure, and Improve It
  1. DZone
  2. Culture and Methodologies
  3. Career Development
  4. How to Make On-Call Work for Everyone

How to Make On-Call Work for Everyone

Looking for your next job? Building out your tech organization? How on-call is handled can make a huge impact on your success.

By 
Shaked Askayo user avatar
Shaked Askayo
·
Jan. 01, 23 · Opinion
Likes (5)
Comment
Save
Tweet
Share
7.1K Views

Join the DZone community and get the full member experience.

Join For Free

I never liked being on-call (slight understatement) or asking others to shoulder some of the load. Sometimes it feels like it's a penalty for being more involved and knowledgeable about our code and infrastructure. And it definitely is a big distraction from core development and innovation.

But there really is no way to avoid it once you have a live product or website with paying customers. Somebody needs to be available just in case something goes wrong.

How on-call is done in your organization or by your prospective employer can make all the difference in your success (and sanity). Here are some approaches I’ve seen that can improve the on-call experience and overall productivity.

Wake Up R&D! 

Being woken up in the middle of the night due to the NOC or support team opening a high severity ticket, only to find out that it was a relatively non-critical issue absolutely sucks. 

To solve this all too common scenario, one company I worked at came up with a simple solution. They replaced the “High Severity” designation with “Wake Up R&D.” By clearly outlining the result of opening a high severity ticket, they forced the opener to think twice (maybe even thrice) about whether the issue was really worth waking someone up in the middle of the night.  

Main Takeaway

Make sure that you or your prospective employer has a good method for separating the signal from the noise. 

Junior and New Employees

For junior or new employees who might be unfamiliar with all the intricacies of what constitutes a critical issue, how it should be handled, etc., it’s essential to have a runbook or some other documentation that outlines what issues warrant waking R&D up in the middle of the night. 

While this type of documentation goes a long way in describing various scenarios, their severity and how they should be handled, it takes a few months for someone to get a sense of the systems they’re working with, and be able to classify incidents accurately. 

Main Takeaway

Make sure that you or your prospective employer invests the time and training for newbies to ease them through this learning process.

The Fifth DORA Metric

Well, it might be the sixth Dora metric as Google added a fifth already in 2021. 

Either way, hat tip to Charity Majors, CTO at Honeycomb who suggests in this excellent blog post that software engineering management should be evaluated not only by the four original DORA metrics but also by how often their “team is alerted outside of working hours.”

This makes perfect sense to me. Management must do their utmost to ensure productivity. 

Why do I make this bold claim? Well I can only speak for myself but if I am feeling stressed about my upcoming on-call duties, I won't be focused on my work. If I’m tired the day after on-call, I won’t be sharp and creative. If I’m feeling overworked and underappreciated for my core contributions, I will be less motivated to give my utmost effort. 

Main Takeaway

Make sure that you or your prospective employee respects employee overall well-being and understands that on-call duties can be very draining.   

The Fear Factor 

Earlier in my career, I’d go through many emotions during on-call incidents. How will I be judged if I don’t know how to handle the situation on my own? It’s 2 a.m. — what if I’m totally off, and this is not an issue at all? Do I want to risk the wrath of the senior expert I barely said two words to since I joined the company? 

These and many other thoughts would race through my head, and no matter the time of day or night, I was fortunate to have a close working relationship with my direct manager and would ping him whenever I was really unsure of what to do.

As CTO at Kubiya.ai, I try to create a healthy balance. Waking a teammate in the middle of the night should obviously be avoided, but at the same time is totally fine provided we did everything we could to solve it on our own. And even if it turns out to be a false alarm or some easy fix, I say better safe than sorry. But this takes coaching and publicly stating to the team that this is our approach so everyone is on the same page and no one is terrified of making the call.

Main Takeaway

If you are building your on-call structure, clearly communicate that we must try and avoid waking colleagues, but at the same time, it’s perfectly acceptable if we need to (and even if we are wrong, it’s ok too). If you are evaluating a prospective employee, try and gauge what their culture is like and ask how they address this issue.

Emotional Intelligence 

Technology teams are not known for their outstanding communication skills. And when you throw in a tense, sev 1 situation, I’ve seen people on-call think they have tracked feature ownership correctly and unfortunately when they reach out to the “owner” it comes across as super accusatory. 

They approach a developer with a buggy piece of code that seems to belong to them. They will be like “hey your code is causing the app to crash, blah blah,” and all of a sudden, the developer has an important meeting and cannot help — or worse, gets super defensive and mouths off. 

So it’s super important to avoid any accusatory or critical tones and/or wording when you think you’d found the issue and the person who can help.

It’s really hard to know if it’s the specific code that is at fault. Maybe it was a change in firewall configuration, or perhaps it was a related but different component that is causing the issue. Refactoring code can also make it look like someone was the author even though they are not.

Plus, if someone wrote something over a year ago and there were many iterations, it will take them some time to dig back in and understand. 

Main Takeaway 

Always be humble when raising an issue. Don’t jump to conclusions or blame anyone. Just ask for help, suggestions, and ideas from the people you think might be able to help. Let them know that while you are not sure they are the right address, you thought that perhaps, because they were involved at some point with the code, they could help. 

Who Should Be On-Call?

This is really a tough question, but in my experience, operations teams should always have someone on-call. That said, if operationally things are very stable while applications are not so stable, developers might need to be the regular members of on-call rotation. 

In smaller companies, devs should probably have ops capabilities anyway, so they can cover all issues.

Of course, during critical releases, particularly new features, devs should be on call.

Main Takeaway

Try and make sure that your teams are well-versed in all relevant areas to the extent possible, but obviously, this may not be realistic. So identify where the system is weakest and allocate on-call accordingly. If you are evaluating a prospective employer, try and see whether your role (dev or ops) would carry the brunt of on-call and make sure it’s reasonable and well compensated for. 

At the end of the day, on-call is the toll we techies must pay. But it’s worth it. Hang in there!

Engineering management Productivity dev teams

Opinions expressed by DZone contributors are their own.

Related

  • Overcoming Alert Fatigue: A Team's Journey to Effective Incident Response
  • How to Improve Team Performance: 11 Effective Ways
  • Project Oxygen: Breathing New Life into Teams and Organizations
  • 8 Metrics for Rapidly Scaling Dev Teams

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!