The Democratization of (Test) Data
Test data is still a major bottleneck in DevOps and CI/CD. What is the missing ingredient of test data success?
Join the DZone community and get the full member experience.Join For Free
A glance at industry research from recent years shows that test data remains one of the major bottlenecks to fix in DevOps and CI/CD:
- The most recent Continuous Testing Report found that the average test team spends a massive 44% of their time finding, waiting for, or making test data.[i]
- The 2021-22 World Quality Report found that incomplete test data continues to undermine software quality, as organizations still lack sufficient data for all of their testing.[ii]
- Test data practices further pose costly compliance risks, as testers at 45% of organizations admit that they do not always follow security and privacy regulations for test data.[iii]
We have written extensively elsewhere on techniques for making complete and compliant data available on the fly to testers, developers, automation frameworks, and CI/CD pipelines. This article will highlight a principle underpinning many of these techniques, which is often missing from test data strategies today. Let’s call this missing ingredient of test data success “the democratization of data.”
The Need to Democratize Test Data
An effective test data strategy today must make rich and compliant data available on-demand to a range of different data requesters. These consumers are not only humans, and nor are they confined to the testing domain. They encompass technologies and stakeholders from across cross-functional teams, including BAs, product owners, CI/CD pipelines, developers, and test automation frameworks.
The types of data these requesters need is usually aligned with the business language and logic of the system under test. A tester, for instance, might require a customer of a particular type, with a particular history. They’re not searching primarily for data with a particular schema or column names. In other words, they are concerned with a “business” view of the data, as opposed to a technical view of back-end databases and files.
This distinction between the technical and business is frequently overlooked in approaches to test data provisioning, such as those that rely on test teams casting complex SQL queries. This forces the skillsets of DBAs and back-end engineers onto testers, developers, and BAs.
A failure to separate the business and technical is frustrating, time-consuming, and haphazard. Delivery teams spend times wrangling data and battling with complex queries, and not testing and developing new functionality. Too often, cross-functional teams today find themselves battling uphill with specialized and skill-intensive processes for finding data.
Separating the Business and the Technical
We are working with organizations today to fix this mismatch between the skills and goals of the teams who are looking for data and the methods available to them for finding the data they need. This often involves building a Test Data Mart.
A Test Data Mart separates the technical and business understandings of data. It provides an abstraction layer and mapping between a user’s understanding of data, and that data’s technical implementation in databases and files. This then allows parallel data consumers to request the data they need using language, concepts, and techniques that are familiar to them:
Using a Test Data Mart, parallel data requesters find and make the data they need on the fly.
A Test Data Mart can be used to create aggregated views of what data exists for testing and development, building a “shopping cart” to search for data. For example, testers and developers might fill out forms constructed in business language, requesting the data the need using drop-downs and fields. These inputs abstract away from the complex technical implementation of data, allowing parallel teams and frameworks to request data using the dimensions that matter to them:
Self-service forms create eCommerce orders into a back-end database, using intuitive inputs and drop-down menus.
This approach can furthermore then store up submitted test data lookups, creating data catalogues and lists. The automated lookups then become reusable — for example, when refreshing an environment. This fills an environment with a rich set of data, tailored to satisfy the on-demand requests made by testers, developers, and business analysts:
A self-service catalogue of data lookups refreshes parallel environments on the fly and lets automation frameworks self-provision the data.
The same lookups can furthermore be exposed from a list to automation frameworks and CI/CD pipelines. For instance, a function in an automation script can ping a test data list via an API, allowing tests to self-provision data on the fly. This provides complete and “just-in-time” data to parallelized tests, enabling rapid and rigorous test automation without test data bottlenecks.
Completing Your Test Data Strategy
Creating a Test Data Mart that separates the business and technical view of data “democratizes” test data, enabling cross-functional teams and frameworks to look up the data they need on-demand. The same approach can further be integrated seamlessly with new or existing test data utilities, including data generation, masking, cloning, and allocation.
This further “democratizes” test data, as it makes the work done by test data engineers reusable by parallel data requesters. As these requesters consume test data on-demand, they can parameterize and trigger pre-configured utilities on the fly. Data generation might then fill gaps in existing data, for example, while cloning and allocation ensure that every test receives the data it needs in parallel.
Any technique needed to create “gold copy” data can in turn become available to parallel data requesters on the fly. This removes the bottlenecks, quality issues, and compliance risks associated with test data.
For an overview of the techniques unlocked when separating the technical and business attributes of data, check out this video from Curiosity’s Managing Director, Huw Price, on structured approaches to data “Find and Makes:”
[i] Capgemini, Sogeti (2020), The CONTINUOUS TESTING REPORT 2020. Retrieved from https://www.sogeti.com/explore/reports/continuous-testing-report-2020/ on 22/03/2021.
[ii] Capgemini, Sogeti (2021), The World Quality Report 2021-22. Retrieved from https://www.capgemini.com/gb-en/research/world-quality-report-wqr-2021-22 on 18/02/2022.
[iii] Capgemini, Sogeti (2021), The World Quality Report 2021-22. Retrieved from https://www.capgemini.com/gb-en/research/world-quality-report-wqr-2021-22 on 18/02/2022.
Published at DZone with permission of Thomas Pryce. See the original article here.
Opinions expressed by DZone contributors are their own.