Over a million developers have joined DZone.

There's a Gap Between Data Quality Confidence and Data Quality Reality

DZone's Guide to

There's a Gap Between Data Quality Confidence and Data Quality Reality

Here's a quick look at why good about your data doesn't mean that your data is reliable, accurate, or valid.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Upon stumbling across the Banking Technology Vision 2018 report, it became apparent that a common trend related to data quality was evident in the banking industry. In many industries, the data stakeholders are routinely confident in the quality of their data, accepting its validity and accuracy at face value. However, a deeper dive reveals that their confidence is misguided. Here's a quick look at why good about your data doesn't mean that your data is reliable, accurate, or valid.

Misplaced Data Quality Confidence

The findings of the survey are a bit startling for those working in the data quality industry, although they're far from unexpected. For instance, 94% of the bankers who replied to the survey said they're confident in the integrity of the sources of their data. This confidence comes despite, as the survey accurately notes, that banks receive consumer data from notoriously unstructured outside sources. Further insights reveal that this confidence in data integrity is optimistic at best. 11% of bankers surveyed viewed their data as reliable, but do not do anything validate the reliability. As data professionals, would you blindly accept the accuracy and reliability of datasets you've never validated? We certainly wouldn't recommend doing so. Additionally, 16% try to validate their data, but aren't convinced of the overall quality; 24% say that they fully validate their data, but still do not have an accurate picture that the data's overall quality is valid, timely, accurate, etc. In short, even those who take the time to validate their data fail to incorporate crucial components of data quality into their process.

Poor Data Quality: A Compounding Problem

As we've demonstrated in the past with the 1-10-100 Rule of Data Quality, poor data quality isn't an isolated issue. When leaders make business decisions based on flawed, incomplete, or invalid data, the consequences are often detrimental to the business as well as its consumers. So, while a full 84% of respondents stated they're increasingly using data to drive the decision-making process at their institutions, the fact that few organizations actually validate their data to begin with should be extremely disconcerting.

What can we take away from this survey? I'd argue that it's imperative to validate your data, rather than simply just hoping it's correct. Additionally, taking steps to improve the quality of data as they enter your databases and applications can go a long way towards improving data quality before it makes its way downstream into your analytics.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,data analytics ,data quality

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}