DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Serverless vs Containers: Choosing the Right Architecture for Your Application
  • Multi-Cluster Networking With Kubernetes and Docker: Connecting Your Containerized Environment
  • From Monolith to Containers: Real-World Migration Blueprint

Trending

  • Microservice Madness: Debunking Myths and Exposing Pitfalls
  • Parallel Data Conflict Resolution in Enterprise Workflows: Pessimistic vs. Optimistic Locking at Scale
  • The Rise of the Intelligent AI Agent: Revolutionizing Database Management With Agentic DBA
  • Scrum Smarter, Not Louder: AI Prompts Every Developer Should Steal
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. ZooKeeper on Kubernetes

ZooKeeper on Kubernetes

By 
Ioannis Canellos user avatar
Ioannis Canellos
·
Nov. 03, 14 · Interview
Likes (3)
Comment
Save
Tweet
Share
21.5K Views

Join the DZone community and get the full member experience.

Join For Free

 The last couple of weeks I've been playing around with docker and kubernetes. If you are not familiar with kubernetes let's just say for now that its an open source container cluster management implementation, which I find really really awesome.

One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I thought that it would be nice to share the experience.

For my experiments I used Docker v. 1.3.0 and Openshift V3, which I built from source and includes Kubernetes.

ZooKeeper on Docker

Managing a ZooKeeper ensemble is definitely not a trivial task. You usually need to configure an odd number of servers and all of the servers need to be aware of each other. This is a PITA on its own, but it gets even more painful when you are working with something as static as docker images. The main difficulty could be expressed as:
"How can you create multiple containers out of the same image and have them point to each other?"
One approach would be to use docker volumes and provide the configuration externally. This would mean that you have created the configuration for each container, stored it somewhere in the docker host and then pass the configuration to each container as a volume at creation time.

I've never tried that myself, I can't tell if its a good or bad practice, I can see some benefits, but I can also see that this is something I am not really excited about. It could look like this:

docker run -p 2181:2181 -v /path/to/my/conf:/opt/zookeeper/conf my/zookeeper

An other approach would be to pass all the required information as environment variables to the container at creation time and then create a wrapper script which will read the environment variables, modify the configuration files accordingly, launch zookeeper.
This is definitely easier to use, but its not that flexible to perform other types of tuning without rebuilding the image itself.
Last but not least one could combine the two approaches into one and do something like:
  • Make it possible to provide the base configuration externally using volumes.
  • Use env and scripting to just configure the ensemble.
There are plenty of images out there that take one or the other approach. I am more fond of the environment variables approach and since I needed something that would follow some of the kubernetes conventions in terms of naming, I decided to hack an image of my own using the env variables way.

Creating a custom image for ZooKeeper

I will just focus on the configuration that is required for the ensemble. In order to configure a ZooKeeper ensemble, for each server one has to assign a numeric id and then add in its configuration  an entry per zookeeper server, that contains the ip of the server, the peer port of the server and the election port.

The server id is added in a file called myid under the dataDir. The rest of the configuration looks like:

server.1=server1.example.com:2888:3888
server.2=server2.example.com:2888:3888
server.3=server3.example.com:2888:3888
...
server.current=[bind address]:[peer binding port]:[election biding port]Note that if the server id is X the server.X entry needs to contain the bind ip and ports and not the connection ip and ports.

So what we actually need to pass to the container as environment variables are the following:


  1. The server id.
  2. For each server in the ensemble:
    1. The hostname or ip
    2. The peer port
    3. The election port

If these are set, then the script that updates the configuration could look like:

if [ ! -z "$SERVER_ID" ]; then
  echo "$SERVER_ID" > /opt/zookeeper/data/myid
  #Find the servers exposed in env.
  for i in `echo {1..15}`;do

    HOST=`envValue ZK_PEER_${i}_SERVICE_HOST`
    PEER=`envValue ZK_PEER_${i}_SERVICE_PORT`
    ELECTION=`envValue ZK_ELECTION_${i}_SERVICE_PORT`

    if [ "$SERVER_ID" = "$i" ];then
      echo "server.$i=0.0.0.0:2888:3888" >> conf/zoo.cfg
    elif [ -z "$HOST" ] || [ -z "$PEER" ] || [ -z "$ELECTION" ] ; then
      #if a server is not fully defined stop the loop here.
      break
    else
      echo "server.$i=$HOST:$PEER:$ELECTION" >> conf/zoo.cfg
    fi

  done
fi
For simplicity the function that read the keys and values from env are excluded.

The complete image and helping scripts to launch zookeeper ensembles of variables size can be found in the fabric8io repository.


ZooKeeper on Kubernetes

The docker image above, can be used directly with docker, provided that you take care of the environment variables. Now I am going to describe how this image can be used with kubernetes. But first a little rambling...
What I really like about using kubernetes with ZooKeeper, is that kubernetes will recreate the container, if it dies or the health check fails. For ZooKeeper this also means that if a container that hosts an ensemble server dies, it will get replaced by a new one. This guarantees that there will be constantly a quorum of ZooKeeper servers.
I also like that you don't need to worry about the connection string that the clients will use, if containers come and go. You can use kubernetes services to load balance across all the available servers and you can even expose that outside of kubernetes. 

Creating a Kubernetes confing for ZooKeeper

I'll try to explain how you can create 3 ZooKeeper Server Ensemble in Kubernetes. 

What we need is 3 docker containers all running ZooKeeper with the right environment variables:

{
                                    "image": "fabric8/zookeeper",
                                    "name": "zookeeper-server-1",
                                    "env": [
                                        {
                                            "name": "ZK_SERVER_ID",
                                            "value": "1"
                                        }
                                    ],
                                    "ports": [
                                        {
                                            "name": "zookeeper-client-port",
                                            "containerPort": 2181,
                                            "protocol": "TCP"
                                        },
                                        {
                                            "name": "zookeeper-peer-port",
                                            "containerPort": 2888,
                                            "protocol": "TCP"
                                        },
                                        {
                                            "name": "zookeeper-election-port",
                                            "containerPort": 3888,
                                            "protocol": "TCP"
                                        }
                                    ]
                                }
The env needs to specify all the parameters discussed previously.

So we need to add along with the ZK_SERVER_ID, the following:

  • ZK_PEER_1_SERVICE_HOST
  • ZK_PEER_1_SERVICE_PORT
  • ZK_ELECTION_1_SERVICE_PORT
  • ZK_PEER_2_SERVICE_HOST
  • ZK_PEER_2_SERVICE_PORT
  • ZK_ELECTION_2_SERVICE_PORT
  • ZK_PEER_3_SERVICE_HOST
  • ZK_PEER_3_SERVICE_PORT
  • ZK_ELECTION_3_SERVICE_PORT
An alternative approach could be instead of adding all these manual configuration, to expose peer and election as kubernetes services. I tend to favor the later approach as it can make things simpler when working with multiple hosts. It's also a nice exercise for learning kubernetes.

So how do we configure those services?

To configure them we need to know:

  • the name of the port
  • the kubernetes pod the provide the service

The name of the port is already defined in the previous snippet. So we just need to find out how to select the pod. For this use case, it make sense to have a different pod for each zookeeper server container. So we just need to have a label for each pod, the designates that its a zookeeper server pod and also a label that designates the zookeeper server id.

"labels": {
    "name": "zookeeper-pod",
    "server":  1
}

Something like the above could work. Now we are ready to define the service. I will just show how we can expose the peer port of server with id 1, as a service. The rest can be done in a similar fashion:

{
    "apiVersion": "v1beta1",
    "creationTimestamp": null,
    "id": "zk-peer-1",
    "kind": "Service",
    "port": 2888,
    "containerPort": "zookeeper-peer-port",
    "selector": {
        "name": "zookeeper-pod",
        "server": 1
    }
}

The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port.

Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy.

{
    "apiVersion": "v1beta1",
    "creationTimestamp": null,
    "id": "zk-client",
    "kind": "Service",
    "port": 2181,
    "createExternalLoadBalancer": "true",
    "containerPort": "zookeeper-client-port",
    "selector": {
        "name": "zookeeper-pod"
    }
}
The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port.

Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy.

{
    "apiVersion": "v1beta1",
    "creationTimestamp": null,
    "id": "zk-client",
    "kind": "Service",
    "port": 2181,
    "createExternalLoadBalancer": "true",
    "containerPort": "zookeeper-client-port",
    "selector": {
        "name": "zookeeper-pod"
    }
}
I hope you found it useful. There is definitely room for improvement so feel free to leave comments.
Kubernetes Docker (software)

Published at DZone with permission of Ioannis Canellos, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Serverless vs Containers: Choosing the Right Architecture for Your Application
  • Multi-Cluster Networking With Kubernetes and Docker: Connecting Your Containerized Environment
  • From Monolith to Containers: Real-World Migration Blueprint

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: