DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.

Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.

E-Commerce Development Essentials: Considering starting or working on an e-commerce business? Learn how to create a backend that scales.

Related

  • Data-Driven Decision-Making in Product Management: The Key to Success
  • Data Excellence Unveiled: Mastering Data Release Management With Best Practices
  • Want To Build Successful Data Products? Start With Ingestion and Integration
  • Data Validation To Improve the Data Quality

Trending

  • An In-Depth Exploration of REST, gRPC, and GraphQL in Web Projects
  • Introducing the Apache JMeter Docker Extension
  • Data Consistency in Distributed Systems: Transactional Outbox
  • Simplify Java: Reducing Unnecessary Layers and Interfaces [Video]
  1. DZone
  2. Data Engineering
  3. Data
  4. Tips for Eliminating Poor Data

Tips for Eliminating Poor Data

The accumulation of bad data leads to an increase in the number of errors in the system, so it is very important to build a continuous process of eliminating them.

Yuri Danilov user avatar by
Yuri Danilov
·
Jul. 05, 23 · Analysis
Like (1)
Save
Tweet
Share
2.2K Views

Join the DZone community and get the full member experience.

Join For Free

The Best Approach To Handling Poor Data

There are many ways to evaluate poor data, but the following approach has proved to be the most effective and universal in practice.

To weed out poor data, you need to:

  • Clearly define criteria for poor data
  • Perform data analysis against these criteria
  • Find out the sources of this poor data
  • Fix poor data
  • Fix poor data sources

Criteria for poor data can be matching the data to a certain type or format, to a range, its completeness, the absence of duplicates, and others.

Next, you need to check all the data or some of them for compliance with these criteria.

At the same time, if the amount of data being checked is large, it makes sense to check only part of the data at the initial stages since most sources of errors can be identified and corrected even on a small sample.

And after correcting these errors, the entire dataset can already be checked.

The source of poor data can be a person who made a mistake while performing data input, such as a POS employee.

It can also be an external information system or some process performing internal calculations in your own information system.

After identifying poor data and its sources, there are two directions to work on:

  • Fix already existing poor data
  • Prevent the appearance of such data in the future

In the first direction, depending on the source, you either need to manually correct the data, or reload them from an external system, or perform correct calculations in your information system.

In the second direction, it is usually necessary to correct the processes that cause the appearance of poor data.

In case of staff errors, you can train them or add input validation.

If poor data comes from an external system, then you need to discuss the exchange format with counterparties.

If poor data is the result of internal calculations, then you need to correct the corresponding algorithms in your information system.

The Efficiency of This Technique

The given technique is extremely effective due to the clear definition of criteria based on the needs of the business.

Also, clear criteria allow you to automate the process of identifying poor data, which allows you to quickly inform about their appearance and respond to it in a timely manner.

For example, you can organize an email notification with the results of data quality checks.

Periodicity of the Bad Data Evaluating Process

It is very important that this data evaluation process be run regularly.

This will allow you to correct errors in data sources in a timely manner and, as a result, avoid time-consuming manual corrections, as well as minimize business risks associated with the use of poor data.

The frequency of starting the data evaluation process depends on the information system type and the data format itself.

In an analytical system, part of the data may remain unchanged, and it is enough to check such data once and eliminate errors, then repeat the process only for new data.

In the online system, large amounts of data can change, and you need to check the entire dataset; in this case, you can constantly check part of the set, and only in case of errors, check the entire dataset for specific errors only.

Speaking of specific values, this process can be daily for systems that are sensitive to any errors, up to once per month, if the data quality allows a certain amount of errors without a significant impact on business processes.

Regardless of the chosen launch frequency, the process of improving data quality itself must exist throughout the entire lifecycle of an information system.

What Can Lead to the Accumulation of Bad Data?

The absence of a process for handling poor data leads to an accumulation of errors.

Moreover, errors in the original data can lead to secondary errors resulting from working with poor data.

Eventually, all these factors will lead to the appearance of a constant component that negatively affects the company's profit.

Summary

Building data quality management processes at the early stages of system deployment is very important.

This will give a set of evaluation criteria and algorithms for analyzing poor data, informing users, and working with sources of poor data on small volumes with the least amount of effort.

After all, it is better to prevent errors than to eliminate them.

Data analysis Data quality Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Data-Driven Decision-Making in Product Management: The Key to Success
  • Data Excellence Unveiled: Mastering Data Release Management With Best Practices
  • Want To Build Successful Data Products? Start With Ingestion and Integration
  • Data Validation To Improve the Data Quality

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: