Holistic Test Data Management: Beyond ETL

DZone 's Guide to

Holistic Test Data Management: Beyond ETL

The traditional ETL approach to test data management simply isn’t good enough. A holistic test data management framework contextualizes the broader aspects of TDM.

· Big Data Zone ·
Free Resource

So, you’ve got a team responsible for test data management.

Your project puts in a request and they grab copies from production. Then, they mask, subset, and deploy them to your test environments.

Image title

Easy peasy...? Well, probably not — probably a lot of paperwork, engineering, and provisioning effort.

And then the issues really start.

  • The data lacks end-to-end integrity (health), i.e. the data is broken.

  • The developers and testers can’t easily find the data they are looking for.

  • When the teams do find the correct data points they can use, they all use it, causing contention and data-related test defects.

And it all starts to grind to halt.

Suddenly, development and test cycles are being blown out.

And then, to add insult to injury, an honest test analyst notices that not all the data has been masked. That's a serious concern when you realize that’s where your project teams spend 95% of their time, and the opportunity for information to be misplaced or stolen is high.

Image title

This is a suboptimal situation that exposes the customer to identity theft and fraud and exposes your own organization to:

  • Compliance penalties

  • Industry sanctions

  • Brand damage

  • Consequent lawsuits

Not exactly ideal — particularly with data compliance legislation like GDPR that will sting you for 4% turnover.

Yet, I can virtually guarantee, sadly, that the above scenarios describe most organizations today.

The reason why data is such a problem is sixfold:

  1. Enterprise architectures are typically diverse and distributed.

  2. Environment and data footprints are under constant change.

  3. Individual databases are often large and poorly defined or understood.

  4. System and data documentation often suffers from technical debt.

  5. It's easy to make mistakes during data subsetting (causing integrity health issues).

  6. It's easy to make mistakes during data obfuscation exercises (causing PII leakage).

The traditional ETL approach to test data management simply isn’t good enough.

  • It is too slow.
  • It is too manual.
  • It is too error-prone.
  • It is not customer- nor user-centric.

There is a fundamental need to recognize that successful test data management can’t rely on ETL alone. Instead, organizations must start looking at data a little more broadly and leverage more automation to ensure the accuracy, quality, compliance, and ease of end-user consumption.

A holistic test data management (HTDM) framework is used to contextualize the broader aspects of test data management: a set of LEGO blocks that call out the broader considerations and needs of an automated test data solution.

Holistic Test Data Management

Built around the traditional ETL, an HTDM promotes the adoption of supporting TDM capabilities like:

  • Data Requirements Capture so you have a clear understanding of consumers (testers and projects) needs.

  • Automated Data Profiling to rapidly understand data structures and PII risks (pre-ETL).

  • Automated Data Validation to rapidly determine if created data (post-ETL) is free of production patterns and healthy (i.e. has integrity).

  • Test Data Mining so testers can visualize, understand, and find end-to-end (cross-system) data without the need to continually build (and rebuild) complex queries and scripts.

  • Test Data Bookings so that test data can be assigned to test cases or teams and avoid the risk of overwriting.  

Key benefits of creating an HTDM framework include:

  • Understanding your data

  • Improving compliance

  • Ensuring data health

  • Making consumption easier

...all of which lead to happy testers and streamlined project delivery.

big data, data privacy, etl, holistic, test data management

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}