DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. DevOps and CI/CD
  4. DevOps: Improving Root Cause Analysis

DevOps: Improving Root Cause Analysis

Root Cause Analysis is the default problem-solving system. Let's see how DevOps culture and methodologies can improve this process.

Derek Weeks user avatar by
Derek Weeks
·
Aug. 11, 18 · Opinion
Like (2)
Save
Tweet
Share
10.23K Views

Join the DZone community and get the full member experience.

Join For Free

We have all been there in a postmortem when someone says, "Let's get to the root of the problem." And, we all know what that means: who or what is to blame?

We also all know that no one wants to play the blame game, yet we all do. But it isn't our fault (no blame, see what I did there?). It has been the default system for solving problems in business for decades. It is called root cause analysis (RCA).

We can change — for the better.

There Is No Root Cause: Emergent Behavior in Complex Systems

I recently watched a presentation from Matthew Boeckman (@matthewboeckman) entitled, There Is No Root Cause: Emergent Behavior in Complex System. Matthew is a Developer Advocate with VictorOps and a Technology Strategist with Dryan.io. He grew up a systems guy and jokes that he has been in DevOps for 18 years, even though DevOps wasn't around because he has always been nice to developers.

Digging in (pun intended), RCA focuses on what went wrong, and how we can prevent it from happening again.

The core problems with RCA for development is that it doesn't provide for enough complexity and its natural focus is blame, which can undermine a positive DevOps culture.

RCA was more applicable when Waterfall was the development methodology because states stayed consistent for months or even years at a time. In the age of Agile, DevOps, CI/CD, microservices, etc., states of work are in a constant flux. RCA can't provide solutions quickly enough. As Matthew notes, in RCA, things are either good or bad, working or broken, uptime or failure. The reality is that our world is more nuanced.

What Matthew recommends is to look at it through the principle of emergence because it, "separates judgment from the good and the bad binary approach to our system health, and instead focuses on behaviors and interactions, patterns and complexities of our system. With practice and effort, we can manage them to more desirable states."

But what does this look like in practice?

Getting back to the analogy of the tree and its roots, the answer is more of a forest than a tree. Trees are one living organism, forests are ecosystems.

Matthew takes this philosophy and mental picture and gives us a better system — Cynefin. It is a Welsh word that means habitat, and was created by Dave Snowden (@snowded), originally for managing IBM's intellectual capital. It draws on research in systems, complexity, network, and learning theories.

Starting in the bottom right quadrant, working counter-clockwise, it goes from simple to more complex.

Simple

These are patterns or behaviors that don't require a great deal of understanding. DevOs is increasingly setting up automated systems to respond to simple issues.

Complicated

These are known unknowns. You can imagine a set of realities where they can occur, and they are probable, but not certain. For instance, a busy harbor might get a storm that causes damage to boats, docks, etc. It is hard for the harbor manager to manage and they need to think about it. This requires people to do some thinking, and it is difficult, if not impossible, to automate.

Complex

This is where we start to see emergent behaviors occur. We don't have the metrics need to understand or manage these problems or you haven't looked at that metric before. We start with probing, going into the system, and exploring. Think of any collection of humans at any scale. Things are still in the scope of probable, but things change quickly. There are many moving parts that aren't predictable and that we didn't fully encounter in our test methodology.

Chaotic

This is, well, chaos. Matthews' real-world example was an entire region for AWS went down, causing other regions to be overloaded as system admins were moving services. In chaos, you act, then get a sense of where things are, and then respond.

Disorder

In DevOps, this is where you have a lack of communication and collaboration. Here teams need to: reduce: figure out what you agree on; analyze: build consensus; and, iterate: move to a quadrant and continue.

Matthew notes that knowledge and practice move patterns towards more favorable quadrants. But, complacency erodes the process. Complex systems left poorly managed will create increasingly complex processes to manage.

How to Adopt Cynefin

  • In the moment, ask, what quadrant does this map to?
  • In the post-incident report: How did we manage the pattern? Was it complicated, complex, simple? What can we do to change it?
  • In your sprint planning: Devote time to manage your patterns clockwise. What can we move with a little bit of work?

The reality is that RCA is really only present after the fact. Cynefin calls us to action.


Convinced that Cynefin might be just what your organization needs or want to dig a little deeper? Share and watch Matthew's full talk above or check it out here. You can watch any of the 2017 AllDayDevOps sessions free-of-charge here.

All Day DevOps 2018

All Day DevOps 2018 is just around the corner! Registration is available here.

The free, online conference goes live on October 17th, offering 100 different practitioner-led sessions, each one 30-minutes long. With 5 separate tracks: CI/CD, Cloud-Native Infrastructure, DevSecOps, Cultural Transformations, & Site Reliability Engineering, and 100 speakers, there's sure to be something for everyone.

And speaking of everyone, if you're part of an organization with 20+ people that want to attend the conference (again, it's free!) then you should consider joining the Club 20 program so that you might get your company logo added to the ADDO site. Check out some of the Club 20 participants here and consider joining them.

Hope to see you online at the show!

DevOps

Published at DZone with permission of Derek Weeks, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Java Concurrency: LockSupport
  • OpenVPN With Radius and Multi-Factor Authentication
  • Tackling the Top 5 Kubernetes Debugging Challenges
  • Benefits and Challenges of Multi-Cloud Integration

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: