Want to Reduce Downtime and Increase Productivity? Test Software Updates With Production Traffic

DZone 's Guide to

Want to Reduce Downtime and Increase Productivity? Test Software Updates With Production Traffic

Sometimes a trial by fire is the best way to learn the true capabilities of your unreleased software.

· DevOps Zone ·
Free Resource

Image title

As the information economy rapidly matures within a software and services delivery model, the cost of unplanned downtime can take a significant toll on profits, productivity, and revenue. These risks lie largely in how software testing is traditionally done. A new approach is needed to streamline software testing, which can ultimately reduce downtime, increase productivity and protect organizations from certain bugs and errors.

The Danger in DevOps

The DevOps model has dramatically changed how testing and development cycles work. To remain competitive, software developers must continually release new application features. They’re sometimes pushing out code updates as fast as they are writing them. This is a significant change from how software and dev teams traditionally operated. It used to be that teams could test for months, but these sped-up development cycles require testing in days or even hours. This shortened timeframe means that bugs and problems are sometimes pushed through without the testing required, potentially leading to network downtime. Adding to these challenges, a variety of third-party components must be maintained in a way that balances two opposing forces: changes to a software component may introduce unexplained changes in the behavior of a network service, but failing to update components regularly can expose the software to flaws that could impact security or availability.

The Long Downside of Downtime

Downtime and rollbacks due to bugs are expensive. On average, it costs four to five times as much to fix a software bug after release as it does to fix it during the design process. The average cost of network downtime is around $5,600 per minute, according to Gartner analysts. And IDC has estimated that for the Fortune 1000, the average total cost of unplanned application downtime per year is as much as $1.25 to $2.25 billion. These are not small numbers.

It’s not just the monetary costs that pose a problem. There’s the loss of productivity that can result when your employees are unable to do their work because of an outage. There are the recovery costs of determining what caused the outage and then fixing it. And on top of all of that, there’s also the risk of brand damage wreaked by irate customers who expect your service to be up and working for them at all times. And why shouldn’t they be irate? You promised them a certain level of service, and this downtime has broken their trust.

What’s more, in addition to software flaws creating big problems now, they can also lead to security issues further down the road. These flaws can be exploited later, particularly if they weren’t detected early on. The massive Equifax breach, in which the credentials of more than 140 million Americans were compromised,  and the Heartbleed bug are just two examples. In the case of the Heartbleed bug, a vulnerability in the OpenSSL library caused significant potential for exploitation by bad actors.

In an environment of continuous integration and continuous delivery, developers make changes to the code that trigger a pipeline of automated tests. The code then gets approved and pushed into production. A staged rollout begins, which allows new changes to be pushed out quickly.

But it also relies heavily on the automated test infrastructure. This is dangerous, since automated tests are looking for specific issues, but they can’t know everything that could possibly go wrong. So then, things go wrong in production. The recent Microsoft Azure outage and Cloudflare’s Cloudbleed vulnerability are examples of how this process can go astray and lead to availability and security consequences.

Testing Software Updates With Production Traffic

Software teams need a way to identify potential bugs and security concerns prior to release, with speed and precision and without the need to roll back or stage. By simultaneously running live user traffic against the current software version and the proposed upgrade, users would see only the results generated by the current production software unaffected by any flaws in the proposed upgrade. Meanwhile, administrators would be able to see how the old and new configurations respond to actual usage. This would allow teams to keep costs down, while also ensuring both quality and security, and the ability to meet delivery deadlines – which ultimately helps boost return-on-investment.

For the development community, building and migrating application stacks to container and virtual environments would become more transparent during development and more secure and available in production when testing and phasing in new software.

Testing software updates with production traffic enables teams to:

  • Verify upgrades and patches using real production traffic
  • Quickly report on differences in software versions, including content, metadata and application behavior and performance
  • Investigate and debug issues faster using packet capture and logging
  • Encourage upgrades of commercial software by reducing risk and measuring performance benefits

Better Software, Faster

Comparing differences between software versions prior to each release can uncover defects in quality and unknown security vulnerabilities, a critical step in software development and the quality assurance process. And the way to accomplish this is by testing software updates in production. The time and cost savings are more than worth it in the long run.

devops, downtime, load testing, software development, testing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}