Best Practices for QA Testing in the DevOps Age
Testers and QA engineers should be adopting new best practices to adapt to a changing development cycle.
Join the DZone community and get the full member experience.Join For Free
Going live with bugs in the code is a risky roll of the dice as it could lead to unplanned outages, and software downtime leads to loss of revenue and of reputation. Analysts at Gartner Research have estimated that downtime can cost companies as much as $140,000 to $540,000 per hour. Google, for example, saw global outages of its Gmail and Drive products in March, affecting customers throughout Australia, U.S., Europe, and Asia. Facebook and Instagram also suffered worldwide outages in March, leaving users unable to access popular apps for several hours. Customers expect on-demand access and service; outages weigh heavily on a brand’s reputation as well as its finances.
Unfortunately, with migration from legacy systems to microenvironments in the cloud, outages and downtime pose a growing and serious problem. Gone is the time when teams could beta test with customers over time to flag real-time bugs. With current quality testing tools, developers often don’t know how a new software version will perform in production or if it even will work in production. The Cloudbleed bug is an example of this problem. In February 2017, a simple coding error in a software upgrade from security vendor Cloudflare led to a serious vulnerability discovered by a Google researcher several months later. Although Cloudflare still worked, the bug meant that it was leaking sensitive data.
Along with these short-term consequences, flaws can create long-term security issues. Heartbleed, a vulnerability that arose in 2014 stemming from a programming mistake in the OpenSSL library, left large numbers of private keys and sensitive information exposed to the internet. This enabled theft that would otherwise have been protected by SSL/TLS encryption.
The QA Testing Paradigm Has Shifted
The standard model of QA testing is not effective in light of today’s increasingly frequent and fast development cycles. Traditionally, DevOps teams haven’t been able to do side-by-side testing of the production version and an upgrade candidate. The QA testing used by many organizations is a set of simulated test suites, which may not give comprehensive insight into the myriad ways in which customers may actually make use of the software. Just because upgraded code works under one set of testing parameters doesn’t mean it will work in the unpredictable world of production usage.
For instance, the Cloudbleed bug flew under the radar of end-users for an extended period of time, and there were no system errors logged as a result of the flaw. Just as QA testing isn’t sufficient, relying on system logs and users also has a limited scope for what can be detected.
An investigation by IBM found that the cost to fix an error after a software release is four to five times higher than if it were uncovered during the design phase —and it can lead to even costlier development delays. Providing software teams with a way to identify potential bugs and security concerns prior to release can alleviate those delays. Testing with production traffic earlier in the code development process can save time, money and pain. Software and DevOps teams need a way to test quickly and accurately how new releases will perform with real (not just simulated) customer traffic while maintaining the highest standards.
Differences or defects can be quickly detected by evaluating release versions side by side. In addition, they can gain real insight on network performance while also verifying the stability of upgrades and patches in a working environment. Doing this efficiently will significantly reduce the likelihood of releasing software that later needs to be rolled back. Rollbacks are expensive. As we saw in the case of the November 2018 Microsoft Azure outage, which lasted 14 hours and affected customers throughout Europe and beyond.
Sometimes for software rollouts take place, the organization needs to run multiple software versions in production—called staging. The software teams put a small percentage of users on the new version, while most users run the status quo. Unfortunately, this approach to testing with production traffic is cumbersome to manage, costly and still vulnerable to rollbacks. The other problem with these kinds of staged deployments is that while failures can be caught early in the process, they are only caught after they’ve affected end-users.
Even at this point, you don’t know whether the new software is causing the “failures.” And how many “failures” does the business allow before recalling or rolling back the software, since the business does not observe side-by-side results from the same customer? This disrupts the end-user experience, which ultimately affects business operations and company reputation. And staging may not provide a sufficient sample to gauge the efficacy of the new release versus the entire population of customers.
The expense of failure remains, too. Let’s say you stage with 10% of customers on the new version. If a failure costs $340,000+ an hour, then a failure affecting 10% of users could potentially still cost more than $34,000 per hour. The impact is reduced, of course, but it’s still significant, not counting the uncertainty of when to roll back.
Fast and Secure
It’s just too risky these days to push software and services live using old-school QA testing, especially for CI/CD rollouts. A much better method is to test in production and evaluate release versions side by side. Software can still be rapidly iterated, but the risk to the software development lifecycle will be reduced. You will be able to release high-quality, secure products that don’t require expensive rollbacks or staging. As the development cycle has changed, so must the testing of its wares.
Opinions expressed by DZone contributors are their own.