The goal of unit testing is to limit the code you test to a small unit of a program. That way, if a unit test fails, it’s easy to identify the area of a program having an issue.
By extension, you should limit the amount of data a unit test uses so that if a program stops processing data correctly it’s easy to identify the data causing the problem. You want just enough data to test a specific item you’re trying to verify—and each unit test should only try to verify one item.
In test-driven development, the problem of having too much data in unit tests is usually avoided because developers manually create data. It’s a time-consuming process, so they only create a limited amount to verify boundary conditions of the code and to verify the program is processing data correctly.
While it would still be more efficient for these developers to use an automated test data creation solution in lieu of manual techniques, you aren’t here to quibble about “manual versus automated,” though I’d recommend you move toward the latter. You’re here to figure out why it’s bad to use too much data in unit tests. Let me explain.
More Data, More Problems
There are at least three major issues with using too much data in unit tests: wasted time, an undesirable expansion of the amount of code tested by a single unit test, and repeated testing.
- Wasted Time: Too often, I see unit tests that use much more data than they really need. When the test case fails, it requires more work because you have to sift through not only more code to determine the issue, but also more data.
- Too Much Code Tested per Unit Test: The other thing about using too much data in unit tests is it ends up executing more program code than a single unit test should. Sometimes, more data used in the unit test means more code that will be utilized to process that data. Again, your goal should be to unit test a single item per test case. By limiting the amount of data in a single test case, you also limit the amount of code that will be exercised, bringing you closer to that goal.
- Repeated Testing: Software doesn’t require wear testing like sneakers. Running the same or very similar data through the program won’t find new problems. You want unique test cases with unique data to exercise your business rules and program execution. Using more data just makes more work to diagnose and takes more time to execute without any added value.
These aren’t to say you can’t use a lot of varied data for testing, but rather that you should separate the data into small sets for many different unit tests. You want to focus on the unit of code being tested and the unit of data needed to test it. Please, no extras; it just increases the amount of time needed to diagnose a unit test failure.
You probably have some “but, what if” questions. Let me address those.
What if You’re Retrofitting Unit Tests?
Limiting the use of data in a test case initially sounds easy, but if you’re retrofitting unit tests into existing code, there can be complications. If you’re testing code that utilizes data from several separate but related data sources, selecting the correctly related data can be time-consuming to identify—and, of course, you still need to verify that the selected data is correct. Here are your options:
- Use a Debugging Tool: For existing programs, one solution is to use a debugger. You can set breakpoints in the code you’re unit testing as well as review data being used, but you still need to verify it’s correct. If you’re using Compuware Xpediter, you can simply right click to Compuware Topaz for Total Test to automatically generate a unit test and collect the necessary test assets, including test data, which can be used repeatedly in later testing. Sometimes the program architecture can require too much data be collected to have a consistent set of data so you can use option two below.
- Use Related Data Extraction Tools
If your current debugger isn’t a good option for collecting data, you can use data tools that provide related data extraction. Tools like Compuware File-AID can extract and load related subsets of data from multiple databases and files once the relationships between the data sources have been defined.
What if Input Data Changes?
Even if the program worked correctly when originally deployed, the data it’s processing may have changed in ways the original developer didn’t expect.
For an extreme example, imagine the chaos if the U.S. added a 51st state. Every validation routine checking for the state abbreviation would have to be updated, and new code would have to be written for the special rules surrounding the new state.
Adding a new type of account can have a similar impact that would require additional unit tests to handle new input data. This is easier to do if you have nicely separated data and unit tests to handle that verification.
It should become a best practice for you to understand the business rules around the data and to ensure unit test cases are validating the code implementing each business rule. Often, you will discover the business rules themselves have been poorly designed and don’t account for all the permutations that can occur in the data.
These “holes” are often problems with missing requirements and not program code. However, these requirement oversights in software can have troubling financial impacts, so it’s important to review these issues with the business.
It’s time for mainframe teams to become nimbler and produce higher quality deliverables to cope with new digital demands. Going faster may not be your personal goal, but it’s certainly the goal of the company you work for or help run.
Automated unit testing is one of the essential ways mainframe developers can accelerate application development and delivery as well as improve code quality, but those unit tests need to be manageable with as little code and data as possible to deliver fast and high-quality software.