Apache Kafka - Resiliency, Fault Tolerance, and High Availability

DZone 's Guide to

Apache Kafka - Resiliency, Fault Tolerance, and High Availability

Prevent data loss within Apache Kafka with ZooKeeper.

· Big Data Zone ·
Free Resource

Apache Kafka is a distributed system, and distributed systems are subject to multiple types of faults. Some of the classic cases are:

  1. A broker stops working, becomes unresponsive, and cannot be accessed.
  2. Data is stored on disks, the disk fails, and then the data cannot be accessed.
  3. Suppose that there are multiple brokers in a cluster. Each broker is a leader of more than one partition. If one of those brokers fails or is inaccessible, then it will result in loss of data.

In these scenarios, ZooKeeper comes to the rescue. The moment ZooKeeper realizes that one of the brokers is down, it performs the following actions:

  1. It will find another broker to take the place of the failed broker.
  2. It will update the metadata used for work distribution for producers and consumers in order to make sure that processes continue.  

Once ZooKeeper has performed these two steps, the publishing and the consumption of the messages will continue as normal. The challenge here is with the failed broker that still holds data. Unless some provision is made to replicate the data somewhere else, that data will be lost.

Kafka provides a configuration property in order to handle this scenario  —  the Replication Factor. This property makes sure that data is stored at more than one broker. Even in the case of the faults listed above, the Replication Factor will make sure that there is no risk of data loss. Another important thing to note here is to determine the number for the Replication Factor. (E.g.  if the replication factor is set to five, then it means that the data is replicated on five brokers. So even a case where four out of these five brokers go down, there will be no data loss.

Another important term here is In Sync Replicas or ISRs. When the replica set is fully synchronized (i.e. ISR is equal to the Replication factor), then we know that each Topic and Partition within that Topic is in a healthy state.

Let's see how Apache Kafka exhibits these features in this tutorial.

Steps of the video tutorial above are listed below:

Step 1 - cd /../kafka_2.12-2.3.0

Step 2 - Start Zookepeer
bin/zookeeper-server-start.sh config/zookeeper.properties

Step 3 - Start Multiple Brokers
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
bin/kafka-server-start.sh config/server-3.properties

Properties to update in server-*.properties

Step 4 - Create Topic
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic Demo

Step 5 - Describe Topic
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic Demo

Step 6 - Start the Producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Demo

Step 7 - Start the Consumer
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic Demo

Step 8 - Kill the Leader
ps aux | grep server-1.properties
kill -9 756

Step 9 - Stop Brokers
bin/kafka-server-stop.sh config/server-1.properties
bin/kafka-server-stop.sh config/server-2.properties
bin/kafka-server-stop.sh config/server-3.properties



apache kafka, apache kafka tutorial, apache kafka tutorial for beginners, apache kafka use cases, async asynchronous, big data, cqrs, event sourcing, messaging, pubsub

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}