Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Docker Swarm Part II: Rescheduling Redis

DZone's Guide to

Docker Swarm Part II: Rescheduling Redis

In this second of three examples of deploying distributed databases using Docker Swarm, the subject is Redis.

· Database Zone
Free Resource

Learn how to move from MongoDB to Couchbase Server for consistent high performance in distributed environments at any scale.

Welcome back! Go back and check out Part I of the series to learn how to install the environment we’ll be using!

Redis Server With Swarm Rescheduling On-Node-Failure

If you noticed in part 1, we deployed Swarm with the --experimental flag. This includes the feature for rescheduling containers on node failure as of Swarm 1.1.0.

This portion of the post will focus on deploying a Redis container and testing out the current state of the experimental rescheduling feature.

Note: rescheduling is very experimental and has bugs. We will walk through an example and review known bugs as we go through.

If you want your container to be rescheduled when a Swarm host fails, then you need to deploy that container with certain flags. One way to do this is with the following flags: --restart=always -e reschedule:on-node-failure or with a label such as -l 'com.docker.swarm.reschedule-policy=["on-node-failure"]'. The example below will use the environment variable method.

First, let’s deploy a Redis container with the rescheduling flags and a volume managed by Flocker.

$ docker volume create -d flocker --name testfailover -o size=10G

# note: overlay-net was created in Part 1.
# note: `--appendonly yes` tells Redis to persist data to disk.
$ docker run -d --net=overlay-net --name=redis-server  --volume-driver=flocker -v testfailover:/data --restart=always -e reschedule:on-node-failure redis redis-server --appendonly yes
465f490d8a80bb53af4189dfec0c595490ebb454f91ded65b9da2edcb4264c2d


Next, SSH into the Docker host where the Redis container is running and take a look at the contents of the appendonly.aof file we instructed Redis to use for persistence. The file should be located on the Flocker volume mount-point for the container and contain no data.

$ cat /flocker/9a0d5942-882c-4545-8314-4693a93fde19/appendonly.aof
# there should be NO data here yet :)


Next, let’s connect to the Redis server and add some key/values. After, look at the contents of the appendonly.aof file again to show that Redis is storing the data correctly.

$ docker run -it --net=overlay-net --rm redis sh -c 'exec redis-cli -h "redis-server" -p "6379"'
redis-server:6379>
redis-server:6379> SET mykey "Hello"
OK
redis-server:6379> GET mykey
"Hello"


View the data within our Flocker volume to verify that Redis is working correctly.

$ cat /flocker/9a0d5942-882c-4545-8314-4693a93fde19/appendonly.aof
*2
$6
SELECT
$1
0
*3
$3
SET
$5
mykey
$5
Hello


Testing Failover

Now we want to test the fail-over scenario making sure our Flocker volume moves the data stored in Redis to the new Docker host where Swarm reschedules the container.

To do this let’s point Docker at our Swarm manager and monitor the events using the docker events command.

To initiate the test, run shutdown -h now on your Docker host that is running the Redis container to simulate a node failure. You should see events (below) that correlate to the node and container dying.

What the events tell us is that the container and its resources (network, volume) need to be removed, disconnected or unmounted because the host is failing. The events you see below are:

  • Container Kill
  • Container Die
  • Network Disconnect
  • Swarm Engine Disconnect
  • Volume Unmount
  • Container Stop   
2016-03-08T21:25:45.832049184Z container kill f3d724a37bdf040baac5a06616e39956610d5409acf9ed62508d43b84f79410d (com.docker.swarm.id=48d1d26490c4943fdb98e6f08be9b62003cd5105e06d9bad94dde2e5913c374b, com.docker.swarm.reschedule-policies=["on-node-failure"], image=redis, name=redis-server, node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22, signal=15)

2016-03-08T21:25:45.975166262Z container die f3d724a37bdf040baac5a06616e39956610d5409acf9ed62508d43b84f79410d (com.docker.swarm.id=48d1d26490c4943fdb98e6f08be9b62003cd5105e06d9bad94dde2e5913c374b, com.docker.swarm.reschedule-policies=["on-node-failure"], image=redis, name=redis-server, node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22)


2016-03-08T21:25:46.422580116Z network disconnect 26c449979ac794350e3a3939742d770446494a9d17adc44650e298297b70704c (container=f3d724a37bdf040baac5a06616e39956610d5409acf9ed62508d43b84f79410d, name=overlay-net, node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22, type=overlay)

2016-03-08T21:25:49.818851785Z swarm engine_disconnect  (node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22)

2016-03-08T21:25:46.447565262Z volume unmount testfailover (container=f3d724a37bdf040baac5a06616e39956610d5409acf9ed62508d43b84f79410d, driver=flocker, node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22)

2016-03-08T21:25:46.448129059Z container stop f3d724a37bdf040baac5a06616e39956610d5409acf9ed62508d43b84f79410d (com.docker.swarm.id=48d1d26490c4943fdb98e6f08be9b62003cd5105e06d9bad94dde2e5913c374b, com.docker.swarm.reschedule-policies=["on-node-failure"], image=redis, name=redis-server, node.addr=10.0.57.22:2375, node.id=256G:KRWJ:D5D4:IZHE:TJUO:FHAO:6HII:ET3F:EULJ:NYFT:LBIX:4HBS, node.ip=10.0.57.22, node.name=ip-10-0-57-22)


Then, some bit of time after the Docker host dies, you should eventually see an event for the container the same container being rescheduled (created again). This is where there is still some work to be done, as of 1.1.3 and our testing we noticed that Swarm has an issue running Start on the container after it has been Created on the new Docker host.

You should see the Create event logged while watching docker events and this actually does initiate the re-creation of the container and the movement of the Flocker volume it was using.

We found that you may need to manually Start the container on the new host after it was rescheduled.

Note: Some of the issues with the container creating but not starting and others are tracked in this Docker Swarm issue.

This is the event we see when the container was rescheduled and created on a new Docker host automatically. Notice the IP address changed to a different IP from the last message; this is because the container is rescheduled on a new Docker host.

# The node:addr was `node.addr=10.0.57.22` before being rescheduled
2016-03-08T21:27:32.010368912Z container create 1cee6bff97a4d86995dd9b126130d40be9b02c6d373e6c0faa32c05110f98475 (com.docker.swarm.id=48d1d26490c4943fdb98e6f08be9b62003cd5105e06d9bad94dde2e5913c374b, com.docker.swarm.reschedule-policies=["on-node-failure"], image=redis, name=redis-server, node.addr=10.0.195.84:2375, node.id=DWJS:ES2T:EH6C:TLMP:VQMU:4UHP:IBEX:WTVE:VTFO:E5IZ:UBVJ:ITWW, node.ip=10.0.195.84, node.name=ip-10-0-195-84)


Review

Here is what happened so far:

Flocker-Swarm-Img


If we run a docker ps against Swarm we can see the Redis container as Created. So, in this case, we can start it manually and Redis is back up and running on a new node!   

$ docker ps -a
CONTAINER ID IMAGE  COMMAND       CREATED       STATUS  PORTS  NAMES
1cee6bff97a4 redis  "/entryp.."   4 minutes ago                ip-10-0-195-84/redis-server

root@ip-10-0-204-4:~# docker start redis-server
redis-server


Let’s connect to the Redis server and make sure the data we added still remains.

$ docker run -it --net=ryan-net --rm redis sh -c 'exec redis-cli -h "redis-server" -p "6379"'
redis-server:6379> GET mykey
"Hello"


The data is still there! Given the current state of rescheduling, it’s not recommended to rely on it.

During our tests, we did come across users that said the container did start. We also came across users that said rescheduling didn’t work at all, or they wound up with two identical containers if the Docker host came back.

Either way, there are certainly kinks to work out and it's part of the community's job to help test, report and fix these issues so they can work reliably. We will update this post along the way to make sure to show you how rescheduling works in the future!

Happy Swarming! Be sure to check out Part III!

We’d love to hear your feedback!

Want to deliver a whole new level of customer experience? Learn how to make your move from MongoDB to Couchbase Server.

Topics:
docker swarm ,redis

Published at DZone with permission of Ryan Wallner, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}