Over a million developers have joined DZone.

Data Quality: Good vs Bad Data [Video]

In this video, simple examples are used to represent what can be a much more complex process in the Bedrock platform. This includes deciding between good data and bad data then the action performed on that data in both cases.

· Big Data Zone

Compliments of Zaloni: Download free eBook "Architecting Data Lakes" to learn the key to building and managing a big data lake, brought to you in partnership with Zaloni.

The term “data quality” refers to not only the properties make up good data vs. bad data but also what to do with that data after a decision has been made.

The first step in the process of separating good data from bad data might be as simple as filtering missing values. It might be more complex to make sure a SSN field has a value and follows the correct numerical pattern. We could even implement sets of rules to check multiple columns each with their own properties.

The second step involves the actual use of that data. Once we confirm we have data that passes our quality standards, we can put that into an external Hive table in a specific location in HDFS. Equally, what do we do with bad data? Do we simply delete it? Do we copy it and archive it? The point is there is also a process for what is considered bad data.

In this video, simple examples are used to represent what can be a much more complex process in the Bedrock platform. This includes deciding between good data and bad data then the action performed on that data in both cases. 

 

To explore additional topics related to your data, please see more about the big data ecosystem or learn more about Bedrock.

Zaloni, the data lake company, provides data lake management and governance software and services. Learn more about Bedrock and Mica

Topics:
big data ,data quality ,bedrock

Published at DZone with permission of Adam Diaz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}