The 5 Most Common Test Data Pitfalls for DevOps
The 5 Most Common Test Data Pitfalls for DevOps
Take a look at five of the most common anti-patterns that contribute to a lack of test data in DevOps.
Join the DZone community and get the full member experience.Join For Free
I’ve previously referenced the finding of the 2018-2019 World Quality Report (WQR) published by Capgemini that cites the lack of test data as being one of the top two challenges facing teams trying to improve the velocity of their application development efforts. Organizations that recognize this are beginning to actively address these challenges. The best way to do this is, as I have previously stated, is by implementing a robust test data management (TDM) solution.
The reality is that many organizations that have moved to DevOps are still trying to solve this issue without making a TDM investment. There are some scenarios where that can be successful. However, as environments grow in both size and complexity, those approaches don’t really scale well. As a result, those organizations are not going to derive the full value of the investment.
Let’s examine the five most common pitfalls you are likely to encounter if you proceed down that path.
1. Lack of a Test Data Strategy
Moving to DevOps requires change across the development and QA departments guided by a thoughtful and comprehensive strategy. A component of that strategy needs to address test data. Prior approaches to test data provisioning are not likely to work once you move to DevOps. It is imperative to have a plan and approach for ensuring that you can provision the right test data at the right time for your team. Don’t find this out the hard way; tackle the issue up front. It is strongly recommended that the strategy include the implementation of a TDM solution, especially if you are dealing with large amounts of sensitive data and/or your environment is complex with multiple integrated systems. Using TDM will allow you to avoid the next four common pitfalls.
2. Trying to Manufacture Test Data
This pitfall (as well as the next two) falls into the category of something that does not scale very well. If you have a relatively small core system that drives the business and that stands alone with integrating with other applications, then it might be possible to manufacture the data needed for testing, but it won’t necessarily be great test data.
As the complexity of your environment increases, manufacturing data will become harder and harder. If there is even a moderate level of integration between systems, you are going to need to spend a lot of time ensuring that you create data sets that have the correct linkages to ensure that a record in properly propagated across your environment. Then you will need to be comfortable that you will probably not be able to replicate the naturally occurring complexities and anomalies that come with real data. It is very difficult to replicate the corner cases that will reveal bugs during testing. Without them, failures aren’t found until the application goes live in production where it becomes very expensive and time-consuming to fix.
3. Not Working with Subsets
As the size and complexity of your data environment grow, it becomes more and more cumbersome to work with the database and the timelines for loading/reloading the environment increase. With a small system that is not integrated with others, this is not much of a problem. As you scale the natural solution is to load just a subset of the data.
In moderately complex environments this starts to become more of a challenge. You will face the same challenges noted above in pitfall #2. If you can crack the code with some clever scripts to manage this, keep in mind you are going to need to update those scripts to reflect changes being made.
4. Writing Your Own Masking Tool
All but the most clueless of development organizations recognize the risks of using unprotected production data to test with (see pitfall #5 below). Masking basic information like names, addresses and SSNs, in and of itself, is not too complex. Lots of teams have written scripts that can handle this. However, in keeping with the common theme regarding complexity of environments, this solution simply does not scale very well.
Some examples of the challenges that are hard to overcome relate to consistently masking records across databases; maintaining geographic consistently and dealing with the aging of data. Good TDM tools handle these scenarios out of the box.,
5. Using an Unprotected Copy of Production
This might seem like a no-brainer, however, the WQR found that nearly 60 percent of firms are using a copy of their production data to test with. If this is your company, you are taking on an inordinate amount of risk. This is especially true if you have PHI, PCI and/or PII in your system. If this data is propagated into your lower environments, you are increasing your risk — even if those lower environments are protected with the same level of security as production.
Chances are you are also not subsetting, either. So, in addition to increased risk, this method also results in increased cost due to the required processing, storage and administrative support necessary to keep these environments running. And, you are probably causing teams to wait while environments are loaded and unloaded.
Test Data Management Tools Help Avoid These Pitfalls!
You can avoid these pitfalls by implementing a robust TDM solution. Once TDM solutions were only within reach of large enterprises, but there are now TDM options that can be deployed for a wider range of development organizations.
More and more offerings are being tailored specifically for DevOps environments. These solutions offer all the traditional TDM features but are designed specifically to make provisioning test data complementary to the DevOps process. Look for solutions that are designed for use by developers and don’t require a central gatekeeper. These tools allow for on-demand provisioning of data to ensure that the right data gets to the right person at the right time. Also, look for ones that can be integrated in an infrastructure-as-code environment, such that when you provision a new test environment, you provision the data to go with it.
Opinions expressed by DZone contributors are their own.