Over a million developers have joined DZone.

Zero Downtime Deployments With Containers

DZone 's Guide to

Zero Downtime Deployments With Containers

We all want our deployments to minimize downtime, but can we completely eliminate it? Read on to see how it can be accomplished with containers.

· DevOps Zone ·
Free Resource

I've been writing a lot about building continuous delivery pipelines with various tools and the whole pattern in the recent past. It's becoming more and more of a mainstream pattern that many development teams now follow. And the benefits are really clear, teams can deploy software faster in smaller change batches and thus more safely. Now when you deploy your services many times a day, one thing becomes a necessity in most of the cases: zero downtime deployments.

Zero Downtime?

Zero downtime really means that even during a deployment process, your service is responsive the whole time. Typical services in today's world are some HTTP services, so in practice this means that no requests are dropped at any point during the deployment process. During a typical deployment process we usually change our software to a newer version on the servers.

Containers to the Rescue?

This is another good example of where using containers can really make a difference. In my opinion zero downtime deployments are much more easily achieved by using containers and especially a container management platform. By all means, this is 100% fully doable without containers too. And there are lots of organisations who have pulled it off in the pre-container era.

One of the reasons why containers make this a lot easier is the fact that we can change a container running our software to a new container with a newer version of our software so easily. In many cases it's really two commands if done manually on the command line:

docker rm -f my-service docker run -d --name my-service my_service:v2 

Of course things are much more complex than that in real life. :)

Requirements for the Deployment Process

Let's look at the deployment process from the requirements point of view, and what it needs to do in order for us to see our software up-and-running 100% during the deployment.

Multiple Instances of the Software Running

It's pretty obvious that in 2018 there needs to be many instances of the software running, not only for zero downtime deployment purposes but for general availability and scalability. Instance in many cases nowadays does mean a container, or a set of containers, but it could mean something else too. It could mean a virtual machine if you'd want to use those as a deployment vessel of sorts.

Rolling Deployment

Running the software in many instances means that we need to do the update so that we will not bring everything down and then up with the newer version. Instead we need to run a gradual upgrade, updating the software one instance at a time. This ensures there's some instances serving requests while others are being updated.

Running the software in many instances also means there needs to be some sort of load balancer in front of the service. The load balancer will distribute the incoming requests among online instances of the software we're running. During the deployment process, something or someone needs to make sure load balancing is kept in sync with the deployment process. In practice it means that when shutting down an instance, it's taken out of load balancing rotation and only put back when it's running again with the newer version of the software we're deploying. One thing is of the uttermost importance: Load balancing updates must be done in a way that the load balancer itself will be online 100% during all changes.

If you are using HAProxy, for example, as the load balancer, it would mean that we need to leave the "old" processes running with the old config until there's no connections on those anymore.

Graceful Shutdown of Service Instances

Perhaps the most important requirement is that your software needs to work in sync with the deployment process. In practice we need to be able to tell our software it's time to go down gracefully, not to accept any more requests. This needs to also be in sync with the load balancer, when our software is going down, load balancers should not give more traffic to it. Also we need to be able to tell our automation (more on that next) how long to wait for ongoing requests to finish up, before terminating the old instance and spinning up the new version of the software.

100% Automation

There are many things that have to work as a well conducted-orchestra during the deployment, so it's pretty obvious that this whole process must be 100% automated. When we're doing multiple deployments per day, we have tens of servers and tens of instances of our software components running and there's no way a human can manage all of this.

On a high level the automated process looks something like this in pseudo-code:

trigger deployment  
for_each instance  
  send shutdown signal
  remove from loadbalancer
  wait for shutdown
  remove old version
  spin up new version
  wait for startup
  add to loadbalancer

One big reason to use containers is the fact that pretty much all container orchestrators have a built-in process of rolling deployment.

Example Application

Let's look how all this ties in together with an example application. As usual, the whole app is available in a GitHub repo here

The App

The application is really the Simple Go app that just serves two static HTTP routes, /hello and /ping. The most interesting bit is how we handle the shutdown process in order to be able to go down gracefully and in a way that load balancers also can understand.

When the app is going down in a container, container runtimes send a SIGTERM signal to the processes. So in the app, we must catch this signal and somehow make the application go down gracefully.

        // Put in a signal handler for SIGTERM
log.Println("Setting signal handler")
signals := make(chan os.Signal, 1)

signal.Notify(signals, os.Interrupt, syscall.SIGTERM)

log.Println("Got shutdown signal, waiting 10s to finish ongoing reqs")
// This will make the healthcheck fail --> LBs will not give any more traffic
healthy = false

// Wait for the ongoing requests to finish
// Note: s.Shutdown actually does gracefull shutdown, here we just make things bit more explicit for
// demonstration purposes
time.Sleep(10 * time.Second)
// Stop the server

On the other hand, when we're going down, load balancers should not give us any more traffic and thus allow us to drain the request queue. For this reason I made the /ping endpoint which will be used by load balancers to see if the app is healthy or not. So upon receiving SIGTERM the app will start to send an un-healthy status on the /ping endpoint and thus load balancers will not give any more traffic to it.

func ping(w http.ResponseWriter, r *http.Request) {
if !healthy {
fmt.Fprintf(w, "pong")

The rolling deployment part is handled by the Kontena container platform with a configuration that ties in the application to both the deployment process and to load balancers automatically:

    image: jnummelin/graceful-stop:latest
    instances: 3
      - ingress-lb/lb
      wait_for_port: 8080
      min_health: 0.8
      protocol: http
      port: 8080
      uri: /ping
      initial_delay: 10
      timeout: 2
      interval: 10
    stop_grace_period: 15s
      KONTENA_LB_VIRTUAL_HOSTS: stop.kontena.works

I'll walk you through the most interesting bits.

links: ingress-lb/lb

This tells Kontena to always configure load balancing for the application. And not only to configure it, it also keeps it in sync during the rolling deployment process.


How big a portion of the application instances, in practice containers, to keep up-and-running during the deployment. In this case it means that 80% of my instances shall be kept running during the deployment. As I've specified I want to have 3 instances, it means that the rolling deployment swaps one ( round(3 * (1 - 0.8)) = 1) container at a time for a new version.


During deployment, Kontena will wait for the application instance to start listening on the given port before continuing the process to the next instance. Allows sufficient time for the app to actually boot up and to be able to serve traffic.


The defined health check also makes the load balancers to use this endpoint for checking the application status. So now when the app instance is going down, the /ping endpoint will respond with 503 - Service Unavailable status to the load balancers and thus the stopping instance will be able to drain its requests.


How long to wait between sending the app SIGTERM and SIGKILL, i.e. how much time to allow for draining all the requests.


So now with our test app handling signals properly and with proper deployment configurations for Kontena we should see 0 failed request during a deployment. Let's test this:

Forcing a Deployment

We must be able to force a deployment and change of containers during the deployment so we can observe the correct behavior. Luckily we have the needed commands available out of the box:

$ kontena service deploy --force graceful/stop
 [done] Deploying service graceful/stop      
⊛ Deployed instance demo-grid/graceful/stop-1 to node little-frost-55
⊛ Deployed instance demo-grid/graceful/stop-2 to node late-sound-19
⊛ Deployed instance demo-grid/graceful/stop-3 to node twilight-lake-92

The --force flag instructs Kontena to go and change all the containers to new ones, regardless if the image or service configuration is changed or not.

To see what happened during the deployment, we can see the service related events log:

2018-03-20T08:38:58.697Z  deploy               service demo-grid/graceful/stop deploy started
2018-03-20T08:38:59.107Z  create_instance      pulling image jnummelin/graceful-stop:latest for graceful/stop-1 (little-frost-55)
2018-03-20T08:39:00.243Z  create_instance      pulled image jnummelin/graceful-stop:latest for graceful/stop-1 (little-frost-55)
2018-03-20T08:39:10.734Z  create_instance      removed previous version of service graceful/stop-1 instance (little-frost-55)
2018-03-20T08:39:10.894Z  create_instance      service graceful/stop-1 instance created (little-frost-55)
2018-03-20T08:39:11.361Z  create_instance      service graceful/stop-1 instance started (little-frost-55)
2018-03-20T08:39:14.421Z  create_instance      pulling image jnummelin/graceful-stop:latest for graceful/stop-2 (late-sound-19)
2018-03-20T08:39:15.651Z  create_instance      pulled image jnummelin/graceful-stop:latest for graceful/stop-2 (late-sound-19)
2018-03-20T08:39:26.035Z  create_instance      removed previous version of service graceful/stop-2 instance (late-sound-19)
2018-03-20T08:39:26.182Z  create_instance      service graceful/stop-2 instance created (late-sound-19)
2018-03-20T08:39:26.642Z  create_instance      service graceful/stop-2 instance started (late-sound-19)
2018-03-20T08:39:29.784Z  create_instance      pulling image jnummelin/graceful-stop:latest for graceful/stop-3 (twilight-lake-92)
2018-03-20T08:39:31.029Z  create_instance      pulled image jnummelin/graceful-stop:latest for graceful/stop-3 (twilight-lake-92)
2018-03-20T08:39:41.531Z  create_instance      removed previous version of service graceful/stop-3 instance (twilight-lake-92)
2018-03-20T08:39:41.705Z  create_instance      service graceful/stop-3 instance created (twilight-lake-92)
2018-03-20T08:39:42.052Z  create_instance      service graceful/stop-3 instance started (twilight-lake-92)
2018-03-20T08:39:44.578Z  deploy               service demo-grid/graceful/stop deployed

What we see is the changing of the containers as expected.

At the same time, while running the deployment, we want to make sure we do not drop any requests. For that, I use a tool called ApacheBench.

$ docker run --rm -ti --net host jordi/ab ab -n 10000 -c 100 http://stop.kontena.works/hello
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking stop.kontena.works (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:        
Server Hostname:        stop.kontena.works
Server Port:            80

Document Path:          /hello
Document Length:        12 bytes

Concurrency Level:      100
Time taken for tests:   60.757 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1290000 bytes
HTML transferred:       120000 bytes
Requests per second:    164.59 [#/sec] (mean)
Time per request:       607.570 [ms] (mean)
Time per request:       6.076 [ms] (mean, across all concurrent requests)
Transfer rate:          20.73 [Kbytes/sec] received

The tool is running 100 requests at a time, 10,000 requests in total.

As seen in the results, 0 requests have failed.

Mission accomplished.


Getting your application to zero downtime deployments is a goal worth pursuing. It really enables you to do deployments multiple times a day without sacrificing your end-users' experience. To get you there, containers and container management platforms can make things a lot easier for you. You still need to make the app work in sync with the deployment process, but it's actually easier than many think.

devops ,automation ,deployment ,docker ,containers ,go

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}