Over a million developers have joined DZone.

Curing System Blindness

· Agile Zone

Learn more about how DevOps teams must adopt a more agile development process, working in parallel instead of waiting on other teams to finish their components or for resources to become available, brought to you in partnership with CA Technologies.

I’ve been writing about seeing systems, and got to thinking about a company I did some work for a few years ago–because they were a great example of how focusing on events leads to blame and prevents people from seeing patterns.

Here’s the story.  The customer service organization in this company had serious problems with availability of a whole slew of systems that their reps relied on.

It was such a problem that they created a new role and a new department to deal with the issue.  The “availability analysts” were charged with collecting data, analyzing the data and supporting a solution to the problem(s).

And collect data they did.  They had data on system performance, run times, crashes, errors, and abnormal conditions;  server up time, server down time, software outages (by application and system component); problem escalations, helpdesk calls, and trouble tickets.

When they weren’t collecting data, they were busy creating “the deck” for the (dreaded) management meeting.  The deck was pages and pages thick. Page one listed the lost productivity figures for the month, in “productive FTE minutes” lost. Page two listed the number of incidents.

pie chart

Source Contributions to Total Outage Minutes for the Month

But page five was the big event:  a pie chart  that showed all the sources of lost productive FTE minutes for the month.    The availability analyst walked the group pointing out that 25% of the outage minutes were due to network problems, 20% due to Mainframe outages, and so forth.

At the end of the pie chart report, the highest ranking manager would demand, ”What are you going to do about this?”   Everyone else at the table tried to look small.  After some squirming, one of the lower ranking people would put forth an idea.  ”I’ll expect a progress report next month,” the top manager would say, sounding stern.  And that was the end of the meeting.

Then, the availability analysts scrambled out to start chasing the problem of the month.

And it all started again the next month when the new pie chart was published.

Events, Patterns, Structure

Events, Patterns, Structure

Both analysts and managers were firmly focused on a snap shot of events, and missed the patterns.  The way they were presenting information helped hide the patterns and keep the focus on the latest hot issue.

I worked to help them see the patterns, which lead to understanding structure and dynamics–and taking meaningful action.

Now, they did need to respond to events–bring the network back up, or swap out a server for example.  They needed to adapt to some of the patterns.  For example, figuring out how to deal with certain types of outages more effectively until deeper changes took hold.

But as long as they only focused on short-term events–the monthly outage minutes–there was little chance of improving the overall situation. They were system blind.

Discover the warning signs of DevOps Dysfunction and learn how to get back on the right track, brought to you in partnership with CA Technologies.


Published at DZone with permission of Esther Derby , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}