Over a million developers have joined DZone.

Load Balancing Apps With Nginx Plus in a Coreos Cluster

Deploy a lightweight load balancer with NGINX Plus on CoreOS, Fleet, and Docker.

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

CoreOS is a Linux OS that is designed to run applications across a cluster of machines. There can be multiple running copies of a web application (each in a separate Docker or rkt container), distributed across a cluster. For such a deployment the configuration of the load balancer needs to change frequently in reaction to changes in the cluster, including:

  • A change in the number of running containers, when (for example) you scale the application up or down
  • A change in the IP addresses of the containers, when (for example) one of the machines goes down due to failure or some other reason, and the affected containers get started on another machine

NGINX Plus provides an HTTP on-the-fly reconfiguration API, which lets you update the configuration of the load balancer dynamically instead of editing and reloading its configuration files.

In this blog post we show how to use NGINX Plus to load balance an application running in a CoreOS cluster. We assume that you’re familiar with Docker, CoreOS, and the tools it comes with, fleet and etcd.

NGINX Plus requires a subscription. If you don’t have one, you can sign up for a free 30-day trial.


Here is an outline of what we’ll do:

  1. Create template unit files for a simple backend application. The distributed init tool for CoreOS, fleet, uses these files to start up and configure instances (which it calls units) of a service (application). Our backend application consists of two services:
    • The backend service, which is multiple instances of an NGINX web server serving static pages
    • The service-discovery service for the backend service; it registers all the backend units in etcd
  2. Create template unit files and other elements required for load balancing our application.
    • The load-balancer service, a single instance of NGINX Plus, which distributes requests among the instances of the backend service
    • The service-discovery service for NGINX Plus; it updates NGINX Plus configuration when backend units are added to or removed from etcd
    • An NGINX Plus configuration file
    • A Docker image of NGINX Plus
  3. Create a cluster of four virtual machines (VMs) running CoreOS.
  4. Use fleet to launch the four services listed in Steps 1 and 2.
  5. Scale the backend application and see how the load-balanced NGINX Plus upstream group gets reconfigured.
  6. Bring one of the VMs down to see how NGINX Plus active health checks work.

To make it easier for you to get the cluster and services up and running, we’ve put all the required templates and configuration files in our coreos-demo repository on GitHub.

As we’ll detail later, we’re using Vagrant to create our CoreOS cluster. The advantage of Vagrant is that a single vagrant up command creates and starts up all VMs, and distributes the templates and other files to them. You can deploy your cluster in an environment other than Vagrant, but in that case you have to make sure that the files get distributed as necessary. For details, see Setting Up the Cluster.

Note: We tested the solution described in the blog post against CoreOS alpha version 949.0.0, Vagrant 1.7.4, and VirtualBox 5.0.10, all running on an OS X El Capitan machine.

About Dynamic Reconfiguration of NGINX and NGINX Plus

In addition, to the on-the-fly reconfiguration API, we’re using in this blog, NGINX Plus supports
DNS-based dynamic reconfiguration. You can configure NGINX Plus to periodically re-resolve DNS names for load-balanced backend servers. We chose not to use this option for this example because it requires extra steps: setting up a DNS server and registering the backends within it.

The open source NGINX software doesn’t support these two dynamic reconfiguration options. To reconfigure NGINX, you modify and reload its configuration files. You can automate the process with tools such as confd. For an example, see How To Use Confd and Etcd to Dynamically Reconfigure Services in CoreOS at DigitalOcean.

Template Unit Files for the Services

In the following sections, we analyze the template unit files for our four services, explaining the meaning of the various instructions (options, in fleet’s terminology). The template files must reside on at least one of the VMs in the CoreOS cluster (the one where we issue fleetctl commands, as described in Starting the Services).

Template for the Backend Service

The backend service consists of multiple instances of the NGINX web server. We’re using the nginxdemos/hello Docker image, which returns a simple web page with the IP address and port of the web server, and the hostname of the container.

The template unit file for the backend service, backend@.service, has the following contents. We’ll use this file to start multiple units of the service.

When fleet starts a backend unit based on this template, the ExecStart option launches a container with the nginxdemos/hello image on one of the VMs; port 8080 on the VM is mapped to port 80 on the container. The Conflicts option tells fleet not to schedule more than one unit of the backend service on the same VM.


Description=Backend Service




ExecStartPre=-/usr/bin/docker kill backend

ExecStartPre=-/usr/bin/docker rm backend

ExecStartPre=/usr/bin/docker pull nginxdemos/hello

ExecStart=/usr/bin/docker run --name backend -p 8080:80 nginxdemos/hello

ExecStop=/usr/bin/docker stop backend


Template for the Backend Service’s Service‑Discovery Service

The service‑discovery service registers units of the backend service in etcd. Its template unit file, backend‑discovery@.service, has the contents that appear below.

Every unit of the backend service has a corresponding unit of the service‑discovery service running on the same VM; this setup is referred to as the sidekick model. The service‑discovery unit records the IP address of the VM it’s running on in an etcd key.

The purpose of the options in the template unit file is as follows:

  • The EnvironmentFile option imports the CoreOS environment variables from the /etc/environment file into the unit file. In our case we use the COREOS_PRIVATE_IPV4 variable to capture the VM’s private IP address.
  • The ExecStart option specifies that the unit launches an inline shell script as it starts. In the script, the while command sets up an infinite loop. Every 100 seconds the etcdctl tool sets a key in etcd, consisting of the private IP address of the CoreOS VM and port 8080, which is mapped to port 80 on the backend container running on the same VM. The key itself contains all the information we need, so we can set a random value for the key (the string lazy-unicorn in our case). Setting a key in etcd effectively registers a backend unit.The --ttl argument sets a time-to-live (TTL) value of 120 seconds, after which the key expires. Since we set the key every 100 seconds, it expires only in abnormal circumstances, for example if the VM goes down.
  • The MachineOf option tells fleet to schedule the service‑discovery unit on the node that hosts the backend service unit with the same instance name (value of the %i unit variable).
  • The BindsTo option (at the top of the file) links the two units such that the service‑discovery unit stops when the backend unit stops.
  • The ExecStop option specifies that when the units stop the etcdctl command is run to remove the key from etcd.

Description=Backend Discovery Service



ExecStart=/bin/sh -c "while true; do etcdctl set /services/backend/${COREOS_PRIVATE_IPV4}:8080 lazy-unicorn --ttl 120; sleep 100; done"

ExecStop=/usr/bin/etcdctl rm /services/backend/${COREOS_PRIVATE_IPV4}:8080


Template for the Load-Balancer Service

The unit template file that we’ll use to start a load-balancer service unit, loadbalancer@.service, has the following contents.

When we start a unit based on this file, an NGINX Plus container starts up on one of the VMs, with ports 80 and 8081 on the VM mapped to the same ports on the container. The Conflicts options tell fleet not to schedule the unit on a VM where either another load-balancer unit or a backend unit is already running.


Description=Load Balancer Service




ExecStartPre=-/usr/bin/docker kill loadbalancer

ExecStartPre=-/usr/bin/docker rm loadbalancer

ExecStart=/usr/bin/docker run --name loadbalancer -p 80:80 -p 8081:8081 nginxplus-coreos

ExecStop=/usr/bin/docker stop loadbalancer



Template for the Load-Balancer Service’s Service‑Discovery Service

To automate the addition and removal of backend servers in the upstream group being load balanced by NGINX Plus, we create a service‑discovery service for the load-balancer service. It does the following:

  1. At startup, gets the IP address and port number from etcd for every container running a unit of the backend service, and uses the NGINX Plus on-the-fly reconfiguration API to add the backend units to the upstream group.
  2. Invokes the etcdctl exec‑watch command, which in turn runs an inline script every time there is a monitored change in etcd. The changes we are monitoring are the addition, deletion, or expiration of a backend key in the /services/backend etcd directory. The script uses the on-the-fly reconfiguration API to add or remove the backend corresponding to the key from the NGINX Plus upstream group.

The template unit file for the load-balancer service’s service‑discovery service, loadbalancer-discovery@.service, has the contents that appear below. The purpose of the options is as follows:

  • As with the service‑discovery service for the backend service, the MachineOf option tells fleet to schedule this unit on the node that hosts the load-balancer service unit with the same instance name (value of the %i variable), and the BindsTo option links the two units such that the service‑discovery unit stops when the load‑balancer unit stops.
  • The ExecStartPre option executes its inline script right before the ExecStart option is invoked. This script gets the keys of all backend units from etcd. Then it runs a curl command for each backend unit, invoking the on-the-fly reconfiguration API to add the unit to the NGINX Plus upstream group. This way NGINX Plus is configured immediately at startup.
  • The ExecStart option runs the etcdctl exec‑watch command, which invokes an inline script every time there is a change in the /services/backend directory of etcd. The change notification is passed to the script in the $ETCD_WATCH_ACTION environment variable and the key is passed in the $ETCD_WATCH_KEY variable. The action performed depends on the value of $ETCD_WATCH_ACTION:
    • The value set indicates that the key was just set in etcd, so we add the backend unit corresponding to the passed-in key to the upstream group. Before adding it, we check if it is already listed in the group, because the API lets you add the same server multiple times, essentially increasing its weight in the load-balancing algorithm. (You can list a server multiple times in an upstream group in the configuration file for this reason as well, but the better way to increase a server’s weight – either in the API call or the configuration file – is to include the weight parameter on its server directive.)
    • The value delete or expire signals the removal of the backend unit from the upstream group. To identify the unit to remove, we have to include its ID as an argument in the API call. To obtain it, we first make an API call that lists all the units and their IDs, and extract the ID of the relevant unit from the response.

Description=Load Balancer Discovery Service



ExecStartPre=/bin/bash -c 'NGINX_ENDPOINT=; \

SERVERS=$(etcdctl ls /services/backend --recursive | awk -F/ '\''{print $NF}'\''); \

for s in $SERVERS; \

do \

 curl "$NGINX_ENDPOINT/upstream_conf?upstream=backend&add=&server=$s"; \

ExecStart=/usr/bin/etcdctl exec-watch --recursive /services/backend -- /bin/bash -c 'NGINX_ENDPOINT=; \

SERVER=$(echo $ETCD_WATCH_KEY | awk -F/ '\''{print $NF}'\''); \


function get_server_id () { \

 ID=$(curl -s "$NGINX_ENDPOINT/upstream_conf?upstream=backend" | awk -v server="$SERVER;" '\''$2==server { print $4 }'\''); \

 echo $ID; \

}; \


if [ $ETCD_WATCH_ACTION == '\''set'\'' ]; \

then \

 echo "Adding $SERVER server"; \

 ID=$(get_server_id); \

 if [ -z $ID ]; \

 then \

 curl -s "$NGINX_ENDPOINT/upstream_conf?upstream=backend&add=&server=$SERVER"; \

 fi \

elif [ $ETCD_WATCH_ACTION == '\''delete'\'' ] || [ $ETCD_WATCH_ACTION == '\''expire'\'' ]; \

then \

 echo "Removing $SERVER server"; \

 ID=$(get_server_id); \

 echo "Server has $ID"; \

 curl -s "$NGINX_ENDPOINT/upstream_conf?upstream=backend&remove=&$ID"; \




The NGINX Plus Configuration File and Docker Image

Before we start the load-balancer service, we need to add the NGINX Plus configuration file to the Docker image of NGINX Plus that we’re building.

NGINX Plus Configuration File

The NGINX Plus configuration file, backend.conf, has the contents that appear below. We’ll place it in the /etc/nginx/conf.d folder in our NGINX Plus Docker image. By default, NGINX Plus reads in all configuration files from that folder.

The upstream directive defines a group of backend servers to be load balanced by NGINX Plus, called backend. We don’t declare any servers for now – we’ll use the on-the-fly reconfiguration API to add them.

Next we declare two virtual servers:

  • The first virtual server listens on port 80 for HTTP requests from clients.
    • The status_zone directive enables collection of live activity monitoring data by the NGINX Plus Status module.
    • The proxy_pass directive tells NGINX Plus to distribute all incoming requests (represented by the / location) among the servers in the backend upstream group.
    • For health checks we create the named location @healthcheck. The internal directive means that regular incoming requests don’t hit this location. The separate location block enables us to handle health checks differently from the regular traffic captured by the / location, even though both health checks and regular traffic are sent to the same upstream group.Specifically, we want to detect the failure of an upstream server very quickly, so we set proxy_connect_timeout and proxy_read_timeout to allow only 1 second each for establishing the connection to the server and for the server to respond to a request, respectively. (The default value for both directives, 60 seconds, applies to the / location.) We also set 1 second as the interval between health checks. We use the default health check, which succeeds if the status code in the server’s response to a request for / is 2xx or 3xx.
  • The second virtual server listens on port 8081 for administrative traffic.
    • The upstream_conf directive in the /upstream_conf location enables us to use the on-the-fly reconfiguration API on the VM.
    • The status directive in the /status location makes live activity monitoring data available there. The built-in NGINX Plus dashboard is accessible at /status.html.
upstream backend {

    zone backend 64k;

server {

 listen 80;
status_zone backend;
location / {

 proxy_pass http://backend;

location @healthcheck {


 proxy_pass http://backend;

 proxy_connect_timeout 1s;

 proxy_read_timeout 1s;

 health_check interval=1s;


server {

 listen 8081;
root /usr/share/nginx/html;
location /upstream_conf {


location /status {


location = /status.html {



Dockerfile for Building the Docker Image

We’ll create a Docker image of NGINX Plus from the following Dockerfile.

FROM ubuntu:14.04
MAINTAINER NGINX Docker Maintainers "docker-maint@nginx.com"
# Set the debconf frontend to Noninteractive

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
RUN apt-get update && apt-get install -y -q wget apt-transport-https
# Download certificate and key from the customer portal (https://cs.nginx.com)

# and copy to the build context

ADD nginx-repo.crt /etc/ssl/nginx/

ADD nginx-repo.key /etc/ssl/nginx/
# Get other files required for installation

RUN wget -q -O /etc/ssl/nginx/CA.crt https://cs.nginx.com/static/files/CA.crt

RUN wget -q -O - http://nginx.org/keys/nginx_signing.key | apt-key add -

RUN wget -q -O /etc/apt/apt.conf.d/90nginx https://cs.nginx.com/static/files/90nginx
RUN printf "deb https://plus-pkgs.nginx.com/ubuntu `lsb_release -cs` nginx-plus\n" >/etc/apt/sources.list.d/nginx-plus.list
# Install NGINX Plus

RUN apt-get update && apt-get install -y nginx-plus
# forward request logs to Docker log collector

RUN ln -sf /dev/stdout /var/log/nginx/access.log

RUN ln -sf /dev/stderr /var/log/nginx/error.log
# copy the config files

RUN rm /etc/nginx/conf.d/*

COPY backend.conf /etc/nginx/conf.d/
EXPOSE 80 8081
CMD ["nginx", "-g", "daemon off;"]

Creating the NGINX Plus Docker Image

For complete details on how to build the NGINX Plus Docker image and run NGINX and NGINX Plus in containers, see the Deploying NGINX and NGINX Plus with Docker blog post.

In brief, the steps for creating the Docker image are:

  1. Place the Dockerfile, the NGINX Plus configuration file (backend.conf), and the certificate and private key for your subscription (nginx-repo.crt and nginx-repo.key) in a directory on the local machine.
  2. Run the following command in that directory to build the image, naming it nginxplus-coreos:
    $ docker build –t nginxplus-coreos .

The image needs to be available on all our CoreOS VMs. There are a couple ways to make that happen. We can set up a private Docker registry, or for the purpose of a quick test we can use Docker commands to load the image onto the VMs. Run the save command to save the newly built image to a file, use scp to transfer the file to all VMs, and run the load command on each VM to load the image.

Setting Up the Cluster

With the template unit files and the NGINX Plus Docker image in hand, we can set up our cluster of four VMs running CoreOS, and start our services.

We are using Vagrant to create the cluster and recommend you do the same. The coreos-demo for this blog post includes complete instructions for setting up a cluster, along with complete code and configuration examples.

The advantage of Vagrant is that a single vagrant up command creates and starts up all VMs, and distributes the templates and other files to them as necessary. You can deploy your cluster in an environment other than Vagrant, but in that case, you have to make sure that the files get distributed as necessary.

After we create the cluster, we choose one of the VMs, run ssh to connect to it, and in the new shell run the following command to check that all VMs are up:

$ fleetctl list-machines

MACHINE        IP           METADATA

1647c9ff...    -

23ed990b...    -

5004d92a...    -

9af60101...    - 

We’ll run commands on this VM for the rest of the blog post, except when we mention otherwise. When we run fleetctl start commands to start our services, we assume that template unit files are located in the current working directory.

Starting the Services

To start up the services, perform these steps on the VM you chose in Setting Up the Cluster:

  1. Start three units of the backend service, allowing fleet to choose the VM on which to launch each container:
    $ fleetctl start backend@1
    Unit backend@1.service inactive
    Unit backend@1.service launched on 1647c9ff.../
    $ fleetctl start backend@2
    Unit backend@2.service inactive
    Unit backend@2.service launched on 23ed990b.../
    $ fleetctl start backend@3
    Unit backend@3.service inactive
    Unit backend@3.service launched on 5004d92a.../
  2. Check that all units are running:
    $ fleetctl list-units
    UNIT                 MACHINE                ACTIVE    SUB
    backend@1.service    1647c9ff.../    active    running
    backend@2.service    23ed990b.../    active    running
    backend@3.service    5004d92a.../    active    running
  3. Start the service-discovery units for the backend application:
    $ fleetctl start backend-discovery@1
    Unit backend-discovery@1.service inactive
    Unit backend-discovery@1.service launched on 1647c9ff.../
    $ fleetctl start backend-discovery@2
    Unit backend-discovery@2.service inactive
    Unit backend-discovery@2.service launched on 23ed990b.../
    $ fleetctl start backend-discovery@3
    Unit backend-discovery@3.service inactive
    Unit backend-discovery@3.service launched on 5004d92a.../
  4. Verify that each service-discovery unit has registered its corresponding backend unit in etcd:
    $ etcdctl ls /services/backend
    The output confirms that three keys were created in the /services/backend directory of etcd.
  5. Start the load-balancer unit:
    $ fleetctl start loadbalancer@1
    Unit loadbalancer@1.service inactive
    Unit loadbalancer@1.service launched on 9af60101.../
    The load balancer starts on the fourth VM, with IP address, because the Conflicts option in the load-balancer template unit file tells fleet not to start it on a VM where a unit of the backend service is already running.
  6. Check that NGINX Plus is running on the fourth VM:
    $ curl
    <head><title>502 Bad Gateway</title></head>
    <body bgcolor="white">
    <center><h1>502 Bad Gateway</h1></center>
    As expected, the response is 502 Bad Gateway, which confirms that NGINX Plus is running but can’t return any content because we haven’t added any backend servers to the upstream group yet.
  7. Start the service-discovery service for the load-balancer service:
    $ fleetctl start loadbalancer-discovery@1
    Unit loadbalancer-discovery@1.service inactive on 9af60101.../
    Unit loadbalancer-discovery@1.service launched on 9af60101.../
    The service-discovery service starts on the same VM as the load-balancer service, as specified by the MachineOf option in its template file.

As we described in Template for the Load-Balancer Service’s Service‑Discovery Service, this service‑discovery service automatically detects changes in the set of etcd keys, which correspond to backend units. It uses the NGINX Plus on-the-fly reconfiguration API to add the backend units to the upstream group being load balanced by NGINX Plus. In this case, it discovers and adds the three backend units we started in Step 1.

We repeat Step 6 and fetch a web page from the VM where the load-balancer unit is running. This time the request is passed to one of the backend units and its webpage is returned:

$ curl

<!DOCTYPE html>



<title>Hello from NGINX!</title>


    body {

        width: 35em;

        margin: 0 auto;

        font-family: Tahoma, Verdana, Arial, sans-serif;






<h2>My hostname is abcf317089f2</h2>

<h2>My address is</h2>



Another way to confirm that NGINX Plus is load balancing three backend units is to open the live activity monitoring dashboard on the VM where the load-balancer unit is running; in our case the URL is When we click on the Upstreams tab, we see three servers in the backend upstream group.

Scaling the Web Application

Now let’s see how quickly the load balancer gets reconfigured after we destroy or start a backend unit, effectively scaling our application up or down.

First we destroy one of the backend units:

$ fleetctl destroy backend@1

Destroyed backend@1.service

Returning to the dashboard, we now see only two servers in the backend upstream group.

To start the backend unit again, we run the fleetctl start backend@1 command, and again see three servers in the Upstreams tab on the dashboard.

Using Active Health Checks to Detect Outages

Let’s see how health checks quickly exclude a faulty server from the load-balanced group. We bring one of the VMs down, simulating an outage, and observe how the unit that was running on the failed VM gets rescheduled on a healthy one.

First let’s bring the number of running backend units down to two by destroying the third unit and its service‑discovery unit:

$ fleetctl destroy backend@3 && fleetctl destroy backend-discovery@3

Destroyed backend@3.service

Destroyed backend-discovery@3.service

Now we bring down VM number 2 in our cluster, where the backend@2 unit is running. To do that we suspend the VM. The method for suspending a VM or machine depends on the environment in which you’re running your CoreOS cluster, but in our Vagrant environment we run the following command from the host VM:

$ vagrant suspend core-02

The dashboard shows that the backend server on VM number 2 is marked failed in red now, meaning that NGINX Plus has marked it unhealthy as a result of a failed health check and is not passing requests to it.

If we wait about 2 minutes (the TTL for etcd keys specified in the Template for the Backend Service’s Service‑Discovery Service), the key expires and the server is removed from the upstream group. Soon after we’ll see a newly launched backend unit when fleet reschedules the unit from the failed VM on a healthy VM.


The on-the-fly reconfiguration API lets you easily integrate NGINX Plus with any service discovery mechanism, minimizing the need to change and reload configuration files, and enabling NGINX Plus to react quickly to changes in the environment. With health checks, you quickly detect a failed server and stop passing requests to it. Live activity monitoring provides you with complete information about the state of NGINX Plus.

Try out the automated reconfiguration of NGINX Plus upstream groups using etcd in a CoreOS cluster for yourself.

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.

nginx,load balancing,coreos,docker

Published at DZone with permission of Patrick Nommensen, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}