Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Six False Negatives In Daily Deployment Test

DZone's Guide to

Six False Negatives In Daily Deployment Test

Deployment tests are essential, but are difficult to read due to all kinds of false negatives that can arise. Here are six to watch for.

· DevOps Zone
Free Resource

Download the blueprint that can take a company of any maturity level all the way up to enterprise-scale continuous delivery using a combination of Automic Release Automation, Automic’s 20+ years of business automation experience, and the proven tools and practices the company is already leveraging.

After a lot of effort and communication, finally the system deployment works! To guarantee a smooth deployment anytime, we enforce daily deployment test as a next step.

Surprisingly daily deployment doesn't always succeed like we expect, even if there are no major changes. Interestingly, many failed tests are kinds of false negatives. So what are the obstacles, and how we can avoid them?

false-alarm.jpg

Permanent Link: http://dennyzhang.com/false_negative

What Does False Negative Mean? Ideally each test failure should be an improvement opportunity. But if you tend to do a repetitive blind retry for a certain failure, we can say it's a false negative. Why? It indicates either you don't care about this failure or you have no other way to improve it. Furthermore false negatives come out with two bad consequences: it takes time to check and retry; it breaks or populates normal tests.

The success of basic deployment logic is only the beginning. Besides constant changes initiated from Dev team,there are multiple things you need to pay close attention to, if you're ambitious and want to deliver a reliable and smooth deployment.

Here are several typical false negatives, from my first-hand experience.

Outage Of External Services

We try to download files from external websites, however they're in maintenance mode. We pull source code from GitHub/Bitbucket for build and deployment, but it's temporarily down or unreachable. This also happens when Node.js needs to fetch community packages, or when Java needs to download jar modules from a public nexus server, however the external servers flip from time to time.

It's always a good practice to replicate and serve files in servers under our own control, instead of 3rd-party websites. Here's how to detect all outbound traffic in deployment. Another improvement allows no hidden dependencies and makes dependencies crystal clear.

Always Download Latest Version

You may be familiar with actions like the below.

# Install Package
package XXX do
  action :install
end

# Download raw file from Github
remote_file '/opt/devops/bin/backup_dir.sh' do
  source 'https://raw.githubusercontent.com/' \
         'XXX/backup_dir/master/backup_dir.sh'
  mode '0755'
  retries 3
  action :create_if_missing
end

Quite often, changes in a latest version come with incompatible issues. This gives us a surprise for our deployment test, or issues that are hard to detect and diagnose.

It's better using a stable tag/branch/version, instead of head revision. For our own code, it's easy to enforce this. However for community and open source code, the story is different.

Low Hardware Resource

To better utilize test machines, we may keep running lots of simultaneous test jobs all the time. The machine may run into low memory, and this will fail our test, even though our code had nothing to do with it. 

Even worse, the OS may run into an OOM issue (Out of Memory). This may crash our critical services like Jenkins or DB, which demands human intervention. Or it runs into kernel panic, which blocks us to ssh, and a machine reboot is our final resort.

To avoid this, we can add precheck logic before running new test jobs. When the OS runs into low hardware resource, stop launching any more test jobs.

Resource Conflict

Two parallel jobs may fail when they both want to get exclusive access to the same resources. Typically there are two major conflicts of deployment test jobs: 1. Run the same job in parallel 2. Run different jobs with shared resources.

Common shared resource are:

  • Global Environments, like JDK versions or global variables
  • Docker specific resources: container name, mounted volume, NAT TCP ports
  • ssh private key file of different Jenkins Jobs
  • etc.

Run Test On Unclean Envs

The deployment test will be invalid and troubleshooting effort will be wasted if the test env is not clean. For example, if "apt-get update" fails, "apt-get install" is doomed to fail as well. If people have deliberately removed some files or packages in advance, deployment may fail as well.

It's better we perform tests on envs with a fresh start.

Slow Service Start

Application may takes several minutes to start while performing system initialization or waiting to get a DB cluster up and running. We need to wait and check our assumptions before testing. Otherwise we will get a false alarm again. Tips: How To Avoid Blind Wait.

Like our blog posts? Discuss with us on LinkedIn, Wechat, or our newsletter.

Download the ‘Practical Blueprint to Continuous Delivery’ to learn how Automic Release Automation can help you begin or continue your company’s digital transformation.

Topics:
deployment ,ci ,continuous integration

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}