Holistic Test Data Management: Beyond ETL
The traditional ETL approach to test data management simply isn’t good enough. A holistic test data management framework contextualizes the broader aspects of TDM.
Join the DZone community and get the full member experience.Join For Free
So, you’ve got a team responsible for test data management.
Your project puts in a request and they grab copies from production. Then, they mask, subset, and deploy them to your test environments.
Easy peasy...? Well, probably not — probably a lot of paperwork, engineering, and provisioning effort.
And then the issues really start.
The data lacks end-to-end integrity (health), i.e. the data is broken.
The developers and testers can’t easily find the data they are looking for.
When the teams do find the correct data points they can use, they all use it, causing contention and data-related test defects.
And it all starts to grind to halt.
Suddenly, development and test cycles are being blown out.
And then, to add insult to injury, an honest test analyst notices that not all the data has been masked. That's a serious concern when you realize that’s where your project teams spend 95% of their time, and the opportunity for information to be misplaced or stolen is high.
This is a suboptimal situation that exposes the customer to identity theft and fraud and exposes your own organization to:
Not exactly ideal — particularly with data compliance legislation like GDPR that will sting you for 4% turnover.
Yet, I can virtually guarantee, sadly, that the above scenarios describe most organizations today.
The reason why data is such a problem is sixfold:
Enterprise architectures are typically diverse and distributed.
Environment and data footprints are under constant change.
Individual databases are often large and poorly defined or understood.
System and data documentation often suffers from technical debt.
It's easy to make mistakes during data subsetting (causing integrity health issues).
It's easy to make mistakes during data obfuscation exercises (causing PII leakage).
The traditional ETL approach to test data management simply isn’t good enough.
- It is too slow.
- It is too manual.
- It is too error-prone.
- It is not customer- nor user-centric.
There is a fundamental need to recognize that successful test data management can’t rely on ETL alone. Instead, organizations must start looking at data a little more broadly and leverage more automation to ensure the accuracy, quality, compliance, and ease of end-user consumption.
A holistic test data management (HTDM) framework is used to contextualize the broader aspects of test data management: a set of LEGO blocks that call out the broader considerations and needs of an automated test data solution.
Built around the traditional ETL, an HTDM promotes the adoption of supporting TDM capabilities like:
Data Requirements Capture so you have a clear understanding of consumers (testers and projects) needs.
Automated Data Profiling to rapidly understand data structures and PII risks (pre-ETL).
Automated Data Validation to rapidly determine if created data (post-ETL) is free of production patterns and healthy (i.e. has integrity).
Test Data Mining so testers can visualize, understand, and find end-to-end (cross-system) data without the need to continually build (and rebuild) complex queries and scripts.
Test Data Bookings so that test data can be assigned to test cases or teams and avoid the risk of overwriting.
Key benefits of creating an HTDM framework include:
Understanding your data
Ensuring data health
Making consumption easier
...all of which lead to happy testers and streamlined project delivery.
Opinions expressed by DZone contributors are their own.