For virtually all development teams, testing code is a given: It's one of the most important parts of software development. Whether your organization includes a separate team devoted to QA, or your developers are testing their own code, QA is the primary way your team ensures that your application's logic is working correctly, and it's the best way for you to identify issues as early as possible.
As a result, QA is critical for engineering velocity, and it helps shape your users' overall experience when engaging with your product. Nobody likes finding a broken app or website. But what about quality assurance for a database? Do most teams apply the same QA practices to improve their data tier? Do many teams even know how to perform database QA?
In this article, we'll talk about how you and your team can apply the same high standards of QA principles to your database, which many organizations often overlook.
Why Traditional QA Leaves Databases Behind
For most tiers of a system, QA is well understood and practiced almost religiously. It's a cornerstone of a product's delivery process, and all responsible teams automatically include it in their workflows. There are sets of steps to follow for good application development QA, and there's a wide variety of tools and processes that support those steps:
- Unit testing/TDD and continuous integration.
- Benchmarking and application profiling.
- End-to-end and system tests, and pre-production environments.
- Production monitoring and automated health checks.
One feature of many of these QA practices is the ability to test code at its most fundamental level — in isolation. In order to achieve this in unit testing, for example, testing tools and environments peel away dependency layers such as databases, so they can see exactly how code logic behaves. That means that many traditional QA unit testing practices are designed to overlook the databases, in favor of a focused examination of the app or code. This, clearly, is a core reason why practices for database QA testing are underdeveloped with many teams. For traditional services, QA is well understood and straightforward; for dependency layers, it offers very little.
Just because it's not obvious how teams should QA their database, however, doesn't mean they shouldn't do it at all. Without database QA, you can't understand the following (among many other aspects):
- How changes affect existing loads.
- Fluctuations in concurrency and contention.
- Real data sets.
Methods for QA Testing a Database
What are the actual methods for doing QA on a database? The simple, tl;dr answer is that queries are the solution to everything. With insightful and detailed query analysis you can find out what's new in your database, measure change, and spot and fix problems early. Similarly, if you can understand how your queries are behaving and compare that behavior to how they should be behaving, you'll have a basis of quality to run QA against.
But let's dig deeper. To do the most thorough QA testing of your database, you'll want to perform exercises like running against a test-dataset in a container for unit-testing/CI. This doesn't need to be excessively complicated — a small dataset that you can leverage to exercise new code paths can help provide initial validation that your databases are responding as expected. The simplest thing to look for here is desired output.
You can also be more verbose and check query behavior, like running SHOW WARNINGS for MySQL, to ensure there's no unexpected query behavior — the query might return successfully, but a warning might indicate that it'll fail in a subtle way in production. These tests can also validate that the data contained in your database is what you expect it to be.The smaller your dataset, the stricter you can make your tests, but, as a trade off, the tests become more brittle and more likely to fail. In unit testing, developers strive for their testing to strike a balance between flexibility and strong data validation.
Additionally, end-to-end testing and pre-production/production monitoring will give you a full picture of how the database behaves and changes in varying contexts. End-to-end testing, in particular, can confirm that your application and data tier work together as expected. And comparing the results from pre-production/production monitoring lets you identify new queries, detect any regressions in query performance, and quickly identify any errors that you missed in your original end-to-end testing. Finally, be sure to also incorporate regression testing, so you can track how modifications to the database have affected its behavior. Some potential regressions include:
- An increase in query latency or lock contention.
- An increase in the number of executions of a query — it's possible to accidentally put a query in a loop and not realize it until you see the query executing more frequently than expected.
- A change in the execution pattern of a query, which has, for example, caused more clients to issue the same query simultaneously, rather than spreading the query out over time. This might cause a change from a "smooth" execution to a pattern that's "spikier."
If you follow normal QA testing standards, you may begin to wonder, "How do I simulate production traffic at scale for large datasets in a controlled environment?" The answer: You don't. Much of a database's function is responsive, and it depends on what the rest of the system is asking the database to do. Trying to simulate a full load—and the full complexity of a full load—is not feasible. This is why pre-production/production comparisons are so important: they can help you develop the perspective you need to understand what your database is actually doing in production versus how it behaves when in more optimal conditions.
A Few More Things to Keep in Mind
Databases change all the time. In some ways, each execution of a query in production represents a unique test of the database's behavior. From that point of view, database QA has to be a continual live production process. System changes and database versions impact how you expect your database to react in ways. Data sets can grow or shrink; read/write ratios change over time; different bottlenecks may trigger when your system hits various milestones in performance; and, ultimately, your database may look and behave very differently from one month to the next.
Additional change comes in many forms. These are a few examples:
- Code commits can change SQL, though such changes can also come indirectly from ORMs.
- UI changes can trickle down to database utilization pattern changes.
- An increase or decrease in the demand for work placed on the database, as related to site/application traffic.
- Data no longer fitting into available memory.
And finally, anywhere a modern team practices continuous deployment, change becomes a constant factor, and the database's behavior and performance quality require regular attention.
Although QA testing for a database is inherently different and trickier than QA testing for other layers of a system, it's no less important. There may not be the same set-in-stone principles and tools available for database QA, but if you understand how your queries behave and how they should be behaving, you can make real progress in maintaining your database's quality.
Like many aspects of database operations, database QA can seem hard to observe and hard to perform, but once you obtain deeper visibility into how your database functions, many of the best practices are common sense.
If you want to learn more about how your data tier may be impacting your organization in unexpected ways, take a look at the free VividCortex e-book The Hidden Costs of Data Engineering, available here.