Kafka Consumer Overview

A developer gives a technical overview of consumers in the Apache Kafka platform, and how they allow devs to work with concurrent data consumption applications.

Sylvester Daniel

Jul. 02, 19 · Tutorial

Likes (14)

Comment

Save

25.6K Views

This article is a continuation of Part 1 - Kafka Technical Overview, Part 2 - Kafka Producer Overview and Part 3 - Kafka Producer Delivery Semantics articles. Let's look into Kafka consumer group, consumer, and protocol used in detail.

Consumer Role

Like a Kafka Producer that optimizes writes to Kafka, a Consumer is used for optimal consumption of Kafka data. The primary role of a Kafka consumer is to take Kafka connection and consumer properties to read records from the appropriate Kafka broker. Complexities of concurrent application consumption, offset management, delivery semantics, and a lot more are taken care of by Consumer APIs.

Properties

Some of the consumer properties in the bootstrap servers are: fetch.min.bytes, max.partition.fetch.bytes, fetch.max.bytes, enable.auto.commit, and many more. We will discuss some of these properties later in the next part of the article series.

Role of Kafka Consumers

Multi-App Consumption

Multiple applications can consume records from the same Kafka topic, as shown in the diagram below. Each application that consumes data from Kafka gets it’s own copy and can read at its own speed. In other words, offsets consumed by one application could be different from another application. Kafka keeps tracks of the offsets consumed by each application in an internal__consumer_offset topic.

Consumer Group and Consumer

Each application consuming data from Kafka is treated as a consumer group. For example, if two applications are consuming the same topic from Kafka, then, internally, Kafka creates two consumer groups. Each consumer group can have one or more consumers. If a topic has three partitions and an application consumes it, then a consumer group would be created and a consumer in the consumer group will consume all partitions of the topic. The diagram below depicts a consumer group with a single consumer.

Kafka multi-partition single consumer

When an application wants to increase the speed of processing and process partitions in parallel then it can add more consumers to the consumer group. Kafka takes care of keeping track of offsets consumed per consumer in a consumer group, rebalancing consumers in the consumer group when a consumer is added or removed and lot more.

Kafka multi-partition multi-consumer

When there are multiple consumers in a consumer group, each consumer in the group is assigned one or more partitions. Each consumer in the group will process records in parallel from each leader partition of the brokers. A consumer can read from more than one partition.

It’s very important to understand that no single partition will be assigned to two consumers in the same consumer group; in other words, the same partition will not be processed by two consumers as shown in the diagram below.

Kafka same partition multiple consumer

Kafka same partition multiple-consumer

When consumers in a consumer group are more than partitions in a topic then over-allocated consumers in the consumer group will be unused.

Kafka unused consumer

Kafka unused consumer

When you have multiple topics and multiple applications consuming the data, consumer groups and consumers of Kafka will look similar to the diagram shown below.

Multiple application and multiple kafka topic

Multiple application and multiple Kafka topic

Coordinator and Leader Discovery

In order to manage the handshake between Kafka and the application that forms the consumer group and consumer, a coordinator on the Kafka side and a leader (one of the consumers in the consumer group) is elected. The first consumer that initiates the process is automatically elected as leader in the consumer group. As explained in the diagram below, for a consumer to join a consumer group, the following handshake processes take place:

Find coordinator
Join group
Sync group
Heartbeat
Leave group

Kafka consumer and coordinator protocol

Kafka consumer and coordinator protocol

Coordinator

In order to create or join a group, a consumer has to first find the coordinator on the Kafka side that manages the consumer group. The consumer makes a “find coordinator” request to one of the bootstrap servers. If a coordinator already doesn’t exist it’s identified based on a hashing formula and returned as a response to “find coordinator” request.

Join Group

Once the coordinator is identified, the consumer makes a “join group” request to the coordinator. The coordinator returns the consumer group leader and metadata details. If a leader already doesn’t exist then the first consumer of the group is elected as leader. Consuming application can also control the leader elected by the coordinator node.

Kafka consumer join group

Kafka consumer join group

Sync Group

After leader details are received for the join group request, the consumer makes a “Sync group” request to the coordinator. This request triggers the rebalancing process across consumers in the consumer group, as the partitions assigned to the consumers will change after the “sync group” request.

Kafka consumer sync group

Kafka consumer sync group

Rebalance

All consumers in the consumer group will receive updated partition assignments that they need to consume when a consumer is added/removed or “sync group” request is sent. Data consumption by all consumers in the consumer group will be halted until the rebalance process is complete.

Kafka consumer rebalance group

Kafka consumer rebalance group

Heartbeat

Each consumer in the consumer group periodically sends a heartbeat signal to its group coordinator. In the case of heartbeat timeout, the consumer is considered lost and rebalancing is initiated by the coordinator.

Kafka consumer heartbeat

Kafka consumer heartbeat

Leave Group

A consumer can choose to leave the group anytime by sending a “leave group” request. The coordinator will acknowledge the request and initiate a rebalance. In case the leader node leaves the group, a new leader is elected from the group and a rebalance is initiated.

Kafka consumer leave group

Kafka consumer leave group

Summary

As explained in Part 1of this series, “partitions” are units of parallelism. As consumers in a consumer group are limited by the partition in a topic, it’s very important to decide you partitions based on the SLA and scale your consumers accordingly. Consumer offsets are managed and stored by Kafka in an internal __consumer_offset topic. Each consumer in a consumer group follows the find coordinator, join group, sync group, heartbeat, and leave group protocols. In the next article in this series, we'll look into Kafka consumer properties and delivery semantics.

kafka application

Opinions expressed by DZone contributors are their own.

Related

Trending