A Beginner's Guide to Apache Kafka
A bare bones, bare necessities guide to what Apache Kafka can do and why it is popular.
Join the DZone community and get the full member experience.Join For Free
A normal messaging queue is not capable of handling big data, which is where a Distributed Messaging Queue comes to the rescue.
Features of a Distributed Messaging System
- It should be scalable, meaning it should easily scale to thousands of nodes.
- It should be fault tolerant in such a way that it should work even if some nodes in a cluster go down.
- It should support replication.
- There shouldn't be a single point of failure, the system should work even if some node goes down.
- It should have higher throughput, it should handle millions of messages per second.
This is where Apache Kafka fits in the world of distributed messaging.
Features of Apache Kafka
- It can easily scale to thousands of nodes in no time.
- It is durable. Messages are persisted into file system and even replicated across clusters.
- It is fault tolerant.
- It has no single point of failure.
- It supports replication in such a way that messages are replicated across a cluster.
- It has higher throughput.
- It is a peer-to-peer architecture and doesn’t follow master-slave.
- It is open sourced by LinkedIn to the Apache Community.
Please see this architecture diagram of Apache Kafka below:
Apache Kafka consists of the following components mentioned below:
The producer sends a message to the broker through the push mechanism.
The consumer reads data from the broker through the pull mechanism.
The broker is a very lightweight component that handles just TCP connections and writes data to a append only log file.
Zookeeper acts a coordinator between the broker and consumer.
Opinions expressed by DZone contributors are their own.