Over a million developers have joined DZone.

Getting to the Root of it All…

DZone's Guide to

Getting to the Root of it All…

The pursuit of truth is key. It is about getting to the root cause of an issue is and why it happened. Read on to see why you need to consider the why when you are performing root cause analysis.

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

When I was at one of the world’s largest hedge funds, a core principle they operated by was the pursuit of truth. Everything was fair game: from why lunch was late to understanding the underlying mechanics of the global currency markets. The firm’s founder laid this out in his principles and given the strong performance of the firm, they are clearly onto something.

Data lineage

So what does this have to do with data? The reality is that the pursuit of truth is just like data lineage. We need to understand the why behind an issue in order to completely fix it.  Data issues occur in all industries with wildly different impacts. What if, for example, your new mobile app lists incorrect store hours to a new user who then slams your company on Twitter because the app…well…stinks? Or perhaps, more severely, your bank gets fined millions by the Fed because they repeatedly failed to consistently report something as simple as accurate zip codes. These two scenarios happened because of source data issues.

If we only had one source system, data lineage (and life) would be easy.  Given the numerous acquisitions, mergers, specialized systems, and manual processes, the travels a single piece of data may take during its life is long and often circuitous.

Understanding what went wrong is the immediate, proximate problem.  But understanding why it happened is not only more important; it is the core principle behind data lineage.  Data lineage follows the lifecycle of data across all of its transformations all the way back to its source(s).  It is about getting to the root cause of an issue is and why it happened. There are typically plenty of proximate causes surrounding an issue, such as delayed system refreshes, fat fingered data, or ETL jobs missing key fields. To get to truth, one must trace the data lineage to identify the underlying root causes of the issue.

For starters, we need to objectively triage the facts of the issue:

  • What events led up to the incident?
  • Where did the error occur in the data lifecycle?
  • Which processes and systems broke down?

Rarely will one incident provide enough information to identify root causes, therefore it is critical to analyze a larger set of incidents:

  • What patterns can we identify from those events facts?

Patterns help us formulate a better picture of the truth, particularly over time.  They help put a point of view together that enables us to take wider corrective actions and solve issues once and for all instead of applying Band-Aid after Band-Aid.

Last, we must understand the people involved:

  • Who was responsible for the data as it changed shape during its life?
  • Were the errors made by humans directly or by humans that failed to create effective processes?

Getting to the “who” is another key ingredient in root cause analysis since at the end of the day, people will have to own the issue and either live with it or fix it. Understanding the “who” is where data lineage intersects with data governance. By holding data owners accountable, data governance becomes linked with data lineage and over time, we can make material progress improving data quality.

All firms struggle with data quality issues with varying degrees of frequency and severity. In our experience, to improve your data quality and reduce system issues, you need to tirelessly pursue the root causes of incidents over time with an unbiased perspective.

So talk to your people, collect trouble tickets, investigate operational incidents with root cause analysis in mind. This data about your issues, however uncomfortable, should be looked at objectively and deeply. They are the symptoms—like that persistent cough, wheeze or the weird rash that won’t go away. The cure just may be Benadryl or perhaps something stronger. You won’t know until you see a doctor. And it’s the same with your data.

The good news is data issues are rarely terminal. We recommend an “annual physical” for your data instead of waiting for the problem to spread and evolve into something that damages your brand or your reputation. At Collaborative, we have the tools, technologies, and approaches to help you get to the truth. We welcome the conversation.

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.

core ,data ,causes ,source ,analysis ,root cause analysis ,issues ,quality

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}