Apache Kafka - Resiliency, Fault Tolerance, and High Availability
Prevent data loss within Apache Kafka with ZooKeeper.
Join the DZone community and get the full member experience.Join For Free
Apache Kafka is a distributed system, and distributed systems are subject to multiple types of faults. Some of the classic cases are:
- A broker stops working, becomes unresponsive, and cannot be accessed.
- Data is stored on disks, the disk fails, and then the data cannot be accessed.
- Suppose that there are multiple brokers in a cluster. Each broker is a leader of more than one partition. If one of those brokers fails or is inaccessible, then it will result in loss of data.
In these scenarios, ZooKeeper comes to the rescue. The moment ZooKeeper realizes that one of the brokers is down, it performs the following actions:
- It will find another broker to take the place of the failed broker.
- It will update the metadata used for work distribution for producers and consumers in order to make sure that processes continue.
Once ZooKeeper has performed these two steps, the publishing and the consumption of the messages will continue as normal. The challenge here is with the failed broker that still holds data. Unless some provision is made to replicate the data somewhere else, that data will be lost.
Kafka provides a configuration property in order to handle this scenario — the Replication Factor. This property makes sure that data is stored at more than one broker. Even in the case of the faults listed above, the Replication Factor will make sure that there is no risk of data loss. Another important thing to note here is to determine the number for the Replication Factor. (E.g. if the replication factor is set to five, then it means that the data is replicated on five brokers. So even a case where four out of these five brokers go down, there will be no data loss.
Another important term here is In Sync Replicas or ISRs. When the replica set is fully synchronized (i.e. ISR is equal to the Replication factor), then we know that each Topic and Partition within that Topic is in a healthy state.
Let's see how Apache Kafka exhibits these features in this tutorial.
Steps of the video tutorial above are listed below:
Step 1 - cd /../kafka_2.12-2.3.0 Step 2 - Start Zookepeer bin/zookeeper-server-start.sh config/zookeeper.properties Step 3 - Start Multiple Brokers bin/kafka-server-start.sh config/server-1.properties bin/kafka-server-start.sh config/server-2.properties bin/kafka-server-start.sh config/server-3.properties Properties to update in server-*.properties config/server-1.properties: broker.id=1 listeners=PLAINTEXT://:9093 log.dirs=/tmp/kafka-logs-1 Step 4 - Create Topic bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic Demo Step 5 - Describe Topic bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic Demo Step 6 - Start the Producer bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Demo Step 7 - Start the Consumer bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic Demo Step 8 - Kill the Leader ps aux | grep server-1.properties kill -9 756 Step 9 - Stop Brokers bin/kafka-server-stop.sh config/server-1.properties bin/kafka-server-stop.sh config/server-2.properties bin/kafka-server-stop.sh config/server-3.properties
Opinions expressed by DZone contributors are their own.