Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Beginner's Guide to Apache Kafka

DZone's Guide to

A Beginner's Guide to Apache Kafka

A bare bones, bare necessities guide to what Apache Kafka can do and why it is popular.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

A normal messaging queue is not capable of handling big data, which is where a Distributed Messaging Queue comes to the rescue.

Features of a Distributed Messaging System

  • It should be scalable, meaning it should easily scale to thousands of nodes.
  • It should be fault tolerant in such a way that it should work even if some nodes in a cluster go down.
  • It should support replication.
  • There shouldn't be a single point of failure, the  system should work even if some node goes down.
  • It should have higher throughput, it should handle millions of messages per second.

This is where Apache Kafka fits in the world of distributed messaging.

Features of Apache Kafka

  • It can easily scale to thousands of nodes in no time.
  • It is durable. Messages are persisted into file system and even replicated across clusters.
  • It is fault tolerant.
  • It has no single point of failure.
  • It supports replication in such a way that messages are replicated across a cluster.
  • It has higher throughput.
  • It is a peer-to-peer architecture and doesn’t follow master-slave.
  • It is open sourced by LinkedIn to the Apache Community.

Please see this architecture diagram of Apache Kafka below:

Apache Kafka- Architecture

Apache Kafka consists of the following components mentioned below:

  1. The producer sends a message to the broker through the push mechanism.

  2. The consumer reads data from the broker through the pull mechanism.

  3. The broker is a very lightweight component that handles just TCP connections and writes data to a append only log file.

  4. Zookeeper acts a coordinator between the broker and consumer.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,apache kafka

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}