DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Data
  4. Meet Cactar, the Ancient Mongolian Warlord of Data Quality

Meet Cactar, the Ancient Mongolian Warlord of Data Quality

Remember Cactar will help you remember the 6 key elements of data quality: consistency, accuracy, completeness, timeliness, accessibility, and reliability.

Jean-Georges Perrin user avatar by
Jean-Georges Perrin
·
Jul. 11, 17 · Opinion
Like (7)
Save
Tweet
Share
4.22K Views

Join the DZone community and get the full member experience.

Join For Free

Let's start with a little history.

There is little trace of Cactar, but we could imagine him as this members of the Mongolian Armed Forces Honorary Guard

On August 18, 1227, the well-known Mongolian emperor Genghis Khan passed. Despite numerous criticisms based on rumors of genocide and brutality, he united Mongolia. One of his sons, Ögedei, took over the Mongol Empire and became the second Great Khan.

A little less known fact is the role one of his sons played. Cactar Khan was not of prime ascendance, as there is no trace of his mother. Cactar was non-violent, which was really rare for a Mongolian of the 13th century. Cactar’s passion for rigor and quality drove him to data quality. From his name, he forged the eponym acronym.

There is little trace of Cactar, but we could imagine him as the member of the Mongolian Armed Forces Honorary Guard above.

6 Key Elements

This pretty much ignored history fact needs to remind us of the six key elements of data quality, named after Cactar.

C: Consistency

Data must be consistent across multiple data sources, servers, and platforms.

Example: You do not want two sensors to report a different level of humidity if they are placed next to one another.

A: Accuracy

The value must mean something, and this something must be accurate.

Example: I was born, in France, on 05/10/1971 but I am a Libra (October). Date formats, when expressed as strings, are transformed through a localization filter. So, being born on October 5th makes my date representation 05/10/1971 in Europe, but 10/05/1971 in the U.S. I feel lucky, because if you are born on March 22nd, you may violate some rules (as there are only 12 months).

C: Completeness

To make sense, your data must be complete — at least, for key data. Missing fields, values, or rows will harm your analysis.

Example: You are trying to geographically locate your customers based on their ZIP Code™, but you are missing 26.3% of them.

T: Timeliness

How recent and relevant is the data? If you are building an industrial system, you must have your data in time. With the growing world of IoT, you expect data to be available right away.

Fun fact: 45 million Americans change address every year.

A: Accessibility

How easily can data be accessed and manipulated? One (manual) way to measure data quality is to be able to sample it, do you have the right tool? How representative is your sample or are you always getting the same 100 first rows of your dataset?

Fun fact: Apache Spark has a seed parameter to the sample method of its dataset. Mica is a great tool to sample data from both Hadoop and RDBMS.

R: Reliability

This last item is about how reliable your sources are. A source can be a database system, but also a human operator inputting data on a system.

Example: If your IoT hub is supposed to send you the data between 2 a.m. and 3 a.m. but for security reasons, access to the Internet is restricted to business hours, your source is definitely not reliable.

Conclusion

The first time I referenced Cactar publicly was during my talk at Spark Summit in June 2017.

Of course, Cactar is a completely fictitious character, but it is always a good mnemonic way to remember data consistency, data accuracy, data completeness, data timeliness, Data accessibility, and data reliability, the six key elements of data quality.

Feel free to share your examples in the comment sections to make this article richer.

Photo Credit: Honor Guard members from Mongolia’s armed forces stand in formation before the opening ceremony of Khaan Quest at Five Hills Training Area, Mongolia, Aug. 3, 2013. U.S. Marine Corps photo by Sgt John M. Ewald.

Data (computing) Data quality

Published at DZone with permission of Jean-Georges Perrin, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Is API-First?
  • 10 Things to Know When Using SHACL With GraphDB
  • Top 10 Best Practices for Web Application Testing
  • MongoDB Time Series Benchmark and Review

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: