Over a million developers have joined DZone.

How to Setup Kafka Cluster

DZone's Guide to

How to Setup Kafka Cluster

In this article, we will discuss how to set up a Kafka cluster with 3 nodes on a single machine. After we're done, you'll be able to make as many nodes as you need!

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In my previous article, I discussed how to setup Kafka with a single node. But, for better reliability and high availability of the Kafka service, we should set it up in cluster mode. In this article, we will discuss how to set up a Kafka cluster with 3 nodes on a single machine. At the end of this article, you will be able to set up a Kafka cluster with as many nodes as you want on different machines.

Download Kafka from Apache's site. Extract the zip file. Make two copies of the extracted folder. Add the suffix _1, _2, _3 to these folders name. So if your extracted folder name was kafka_2.11-1.1.0, now you will have the folders kafka_2.11-1.1.0_1, kafka_2.11-1.1.0_2, kafka_2.11-1.1.0_3. Go to the kafka_2.11-1.1.0_1 folder.

  1. Create a folder named logs. This is where Kafka logs will be stored.

  2. Go to the config directory and open the server.properties file. This file contains Kafka broker configurations.

  3. Set broker.id to 1. This is the id of the broker in a cluster. It must be unique for each broker.

  4. Uncomment the listener's configuration and set it to PLAINTEXT://localhost:9091. This configuration means the broker will be listening on port 9091 for connection requests.

  5. Set the log.dirs configuration with the logs folder path that we created in step 1.

  6. In the zookeeper.connect configuration, set the Zookeeper address. If Zookeeper is running in a cluster, then give the address as a comma-separated list, i.e.:localhost:2181,localhost:2182
  7. Above are the basic configurations that need to be set up for the development environment but if you want to change some advanced configurations, like retention policy, etc., then I have discussed that in another DZone article, here.

Our first broker configuration is ready. For the other two folders or brokers, follow the same steps with the following changes.

  • In step 3, change broker.id to 2 and 3, respectively.

  • In step 4, change the ports used to 9092 and 9093, respectively (you can provide any port number that is available).

Now our configuration is ready for all brokers. Go to the home directory of each Kafka folder and run the command ./bin/kafka-server-start.sh config/server.properties. 

Execute the command (all as one line):

 ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor

3 --partitions 50 --topic demo 

This command will create a topic demo with 50 partitions with a replication factor of three for each partition. A replication factor of three means that for a partition there will be one leader and two followers. When a message or record is sent in the leader it is copied in followers.

Execute this command:

./bin/kafka-topics.sh --describe --topic Hello-Kafka --zookeeper localhost:2181 This allows you to see which broker is the leader or follower for which partition. The output looks something like what I've given below.

Topic: demoPartition: 0Leader: 2Replicas: 2,3,1Isr: 2,3,1
Topic: demoPartition: 1Leader: 3Replicas: 3,1,2Isr: 3,1,2
Topic: demoPartition: 2Leader: 1Replicas: 1,2,3Isr: 1,2,3
Topic: demoPartition: 3Leader: 2Replicas: 2,1,3Isr: 2,1,3
Topic: demoPartition: 4Leader: 3Replicas: 3,2,1Isr: 3,2,1
Topic: demoPartition: 5Leader: 1Replicas: 1,3,2Isr: 1,3,2
Topic: demoPartition: 6Leader: 2Replicas: 2,3,1Isr: 2,3,1

 For Partition 0, Broker 2 is the leader and for partition 1, Broker 3 is the leader. ISR means in sync replicas.

Now that we have set up Kafka cluster of three brokers you can setup a cluster with as many brokers as you want by making a few changes. 

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

kafka ,apache kafka ,big data ,kafka cluster

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}