Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Run a MongoDB Replica Set on Kubernetes PetSet or StatefulSet

DZone's Guide to

How to Run a MongoDB Replica Set on Kubernetes PetSet or StatefulSet

Kubernetes 1.5 and the StatefulSet feature (previously named PetSet) can be used to run a containerized replica set of MongoDB 3.2 database. Come learn how!

· Database Zone
Free Resource

Traditional relational databases weren’t designed for today’s customers. Learn about the world’s first NoSQL Engagement Database purpose-built for the new era of customer experience.

Running and managing stateful applications or databases such as MongoDB, Redis, and MySQL with Docker containers is no simple task. Stateful applications must retain their data after a container has been shut down or migrated to a new node (for example, if during a failover or scaling operation, the container was shut down and re-created on a new host).

By default, Docker containers use their root disk as ephemeral storage, a chunk of disk space from the host filesystem that runs the container. This disk space can’t be shared with other processes nor can it be easily migrated to a new host. While you can save the changes made within the container using the “Docker commit” command (which creates a new Docker image that will include your modified data), it can’t be used as a de facto way to store content.

On the other hand, the “Docker volume” feature allows you to run a container with dedicated volume mounted. This volume comprises another chunk of space from a host machine (but this time, persistent and independent from container lifecycle; it’s not being deleted after container removal), network storage, or a shared filesystem mount, depending on the storage plugin you are using.

For production-grade containerized stateful application management, you can take advantage of tools such as “flocker” and “convoy.” To avoid manually configuring these for each Docker host in your cluster, you can use Kubernetes “Persistent Volumes,” which abstract the underlying storage layer — be it AWS EBS volumes, GCE persistent disk, Azure disk, Ceph, OpenStack Cinder, or other supported systems.

In this tutorial, we will explain how you can run a containerized replica set of MongoDB 3.2 database, using Kubernetes 1.5 and utilizing the StatefulSet feature (previously named PetSet). The StatefulSet feature assigns persistent DNS names to pods and allows us to re-attach the needed storage volume to another machine where the pod migrated to, at any time.

Note: To proceed with the tutorial, a competency with Kubernetes basics and terminology (like pods, config maps, and services) is required.

The StatefulSet feature is used with a dedicated “service” that points to each of its member pods. This service should be “headless,” meaning that it doesn’t create ClusterIP for load balancing but is used for static DNS naming of pods that will be launched. This service name will be referenced in “spec: serviceName:” section of the StatefulSet configuration file. It will cause the creation of enumerated DNS records in this format: “name-0,” “name-1,” “name-2,” etc. Luckily, Kubernetes service discovery allows any pod to access services in the same namespace, by simply querying the service name. If a pod is launched and detects its own hostname as “mongodb-4,” it will know for sure where to look for the master, which is “mongodb-0.”

In StatefulSet pods are launched strictly one after another. Only when the previous pod is successfully initialized will the next one be started. This way you can confidently plan your deployment of pods with “name-0” being the first launched pod. “Name-0” will bootstrap the cluster, replica set, etc., depending on the application you run.

In MongoDB, the master node will initialize a replica set. Then, pods named “name-1,” “name-2,” etc. will recognize the fact that a “replica set” was already created and will connect to existing nodes. It’s worth noting that when deploying applications like Consul, MongoDB, Redis, and the like, it can be hard to know which is the current master node. This is because these apps periodically re-elect a master/primary node, not just during failover, but, for example, even after “rs.add(hostname)” (MongoDB shell command to add a new member to replica set), your next launched pod member can’t be sure that “mongodb-0” is still the primary. By the time “mongodb-4” pod is started, because of internal re-election, any previous node might already be the new primary.

All the above should help us understand what’s going on in the following example bash init scripts.

We’re going to use a helm chart (Kubernetes package) as an example of deploying a StatefulSet with three MongoDB replica set members. To install the helm package manager and its server-side component Tiller, please follow this official install guide.

If you'd prefer to skip reading the guide, just run this on the same machine where you have kubectl properly configured (helm uses kubectl configuration to connect to Kubernetes cluster):

Image title

The first step is to download and install helm. Next, install Tiller in your cluster (helm knows where to install, from the $HOME/.kube/config file, and can access Kubernetes API like kubectl using this config file).

If both steps are completed successfully, you will be presented with the message: “Tiller” (the helm server-side component) has been installed into your Kubernetes Cluster.” We can then proceed to the installation steps of MongoDB cluster.

At the time of writing, MongoDB StatefulSet helm chart is located in the “incubator” repository, meaning it hasn’t yet been released to “stable” repo. We’ll use this chart as an example to understand how StatefulSet works, and how we can modify it to fit our needs or later run any type of database using the same techniques.

Check which packages are visible to you with “helm search” command, notice you can see only “stable/something” packages. Enable “incubator” helm charts repository:

Image title

You will see “incubator has been added to your repositories,” and by running “helm search” again, verify you see new “incubator/something” packages.

If want to change default values provided with this package, download the values.yaml file or simply copy its content and replace any values, like Storage: "10Gi," Memory: "512Mi," or Replicas: 3.

Then, during install, command point to your modified file with -f values.yaml.

Now you are ready to launch MongoDB replica set with this command:

Image title

(If you have modified default values).

After a few seconds, refer to your Kubernetes dashboard; you should see the following resources created:

StatefulSet named mymongo-mongodb-replicas.

Image title

“Persistent Volume Claims” and three volumes:

Image titleThree pods named “mymongo-mongodb-replicas-0/1/2:”

Image title

Next, refer to your StatefulSets again. It should be lit green now because all three pods are initialized.

Image title

AWS notice: If you are running on AWS, the default Kubernetes 1.5 StorageClass will provision EBS volumes in different availability zones. You will see an error such as “pod (mymongo-mongodb-re-0) failed to fit in any node fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1)”.

If this happens, delete the EBS volumes in "wrong" AZs and create a StorageClass constrained to a particular availability zone, where your cluster has its worker nodes, using the following example. Create a file named new-aws-storage-class.yml with content:

Image title

Submit this to Kubernetes with kubectl create -f new-aws-storage-class.yml, and you should see response of “storageclass generic created.”

Now persistent volume claims will dynamically create PVs in us-west-2a only (replace us-west-2a  in this file with the AZ that fits your cluster setup).

Following this package authors advice, we can find which pod is our primary replica, using this bash command:

Image title

Then, look at which pod shows ismaster: true and copy its name (this JSON output shows full-service DNS names, so a single pod name is the left part before the first dot).

In my case, it’s still the mymongo-mongodb-replicas-0 pod. We can write some value into the master pod mongo. Then, execute this command:

Image title

If everything is working, you should see  { “nInserted” : 1 }. To read this value from any slave pod, execute:

Image title

You will see something like this:

Image title

Those basic verification steps prove that your replication is working and a value we inserted into the primary node can be fetched from any of the slave replicas.

Also, you can log into the interactive MongoDB shell by executing the following:

Image title

This allows you to perform arbitrary actions on the database and you can safely exit the shell without worrying that your container will close on exit (as if you exit the shell after opening it with “Docker attach”).

Let’s have a quick look at the components Helm Chart used to create this MongoDB StatefulSet and persistent volumes for storing data.

Headless Service YAML Manifest

A service that doesn’t have a ClusterIP or NodePort specified means it doesn’t attempt to load-balance traffic to underlying pods. Its sole purpose is to trigger DNS names creation for pods.

Please notice the annotation used:

Image title

It will cause endpoint creation for a pod, ignoring its state, which is exactly what we need.

Why? Because we use our own initialization mechanism in MongoDB when a pod starts, and each replica set member must be able to reach others during initialization, even if they’re not yet “healthy” (ready to serve requests and traffic).

MongoDB Daemon Configuration Declared in ConfigMap YAML

All extra settings and fine tuning of mongo behavior goes here. This file will be rendered into every pod and used as config file. The file in Helm Chart that we used is very minimalistic and has no special performance tuning or whatever you might need in your production deployment. Feel free to modify it to fit your needs. The simplest method is to git clone the chart’s repository, modify the needed files and definitions, and use “helm package mongodb-replicaset” which will archive your modified “mongodb-replicaset” folder to “.tgz” archive, to use later with “helm install — name your-release-name mongodb-replicaset-0.1.3.tgz”. If you don’t specify — name, you’ll have to live with a random release name helm generates for your set, like “curious-penguin” or the like. You can read more about helm here (highly recommended).

StatefulSet YAML Manifest

The StatefulSet YAML manifest includes the Persistent Volume Claims template. This is the most complex resource declaration, which has init containers defined in this section:

Image title

The two containers named “install” and “bootstrap” are started before the main pod container. The first one, named “Install,” loads a small image which holds special files like “install.sh,” “on-start.sh,” and “peer-finder.” This container does two important things: mounts two of your newly defined volumes (one is “config” which is created from ConfigMap and has only the config file, second is a temporary mount named “work-dir”), and copies needed init files to “work-dir.” You can rebuild that image and put anything else that might be needed for your stateful application. Pay attention, “work-dir” is not yet the persistent volume, it’s just a place for running few init files in your pod during next steps.

The second init container, named “bootstrap,” uses MongoDB 3.2 official image (by the time you read this, it might be any other new version of MongoDB image, but because we do init steps in separate containers, we don’t need to modify real mongo image, we add our extra files using mounts “work-dir” and “config”). It will mount the main persistent volume (which is defined later on, in the volumeClaimTemplates section of StatefulSet manifest) to the /data/db  path. And run “peer-finder,” a simple tool used for fetching other peers endpoints from Kubernetes service API, you can find it here. on-start.sh has the logic that detects whether MongoDB replica set is already initialized by other peers and joins the current pod to a replica set. If it detects no replica set, it will set itself as master and initialize one, so others will join.

Before we can use this in production, the next step is to write more data into our newly created replica set and verify the failover mechanism in case one of our Kubernetes nodes goes down unexpectedly.

Learn how the world’s first NoSQL Engagement Database delivers unparalleled performance at any scale for customer experience innovation that never ends.

Topics:
kubernetes ,container management ,cluster ,docker ,mongodb ,tutorial ,database

Published at DZone with permission of Oleg Chunikhin. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}