Apache Kafka Consumer Group Offset Retention

This article provides a fix to a problem in building a Kafka cluster.

Praveen KG

Feb. 10, 20 · Tutorial

Likes (5)

Comment

Save

38.8K Views

We are using Apache Kafka and recently, we faced an issue. I thought to share the knowledge so that it can help someone, either in quick solving or taking a precautionary step while building a Kafka cluster.

Kafka is a distributed pub-sub system and becoming a ubiquitous messaging system. Kafka’s popularity is mainly related to:

Built-in support for streaming use cases — a great advantage, as stream processing logic is part of your application infrastructure and does not require any additional hardware or infrastructure.
Support for a lot of connectors both source and sink, which reduces the development activity to transfer data across the system, like MongoDB, Couchbase, ElasticSearch, etc.
Strong Enterprise support from Confluent.

Having said that, let me explain the problem we faced. We use Apache Kafka as pub-sub system to integrate multiple source channels with different data formats — XML and JSON — and we have many consuming applications that process the events/messages, an ETL use case. Recently in our non-production environments, two of our consumers complained that they lost some messages.

You might also be interested in: Kafka Consumer Overview

When we started to investigate the issue, we didn’t see any errors in the consumer application log or in the Kafka server log. And Consumers are using the latest Offset strategy.

Just to summarize, Kafka supports mainly three types of auto.offset.reset values for the consumer application:

Earliest — when the consumer application is initialized the first time or binds to a topic and wants to consume the historical messages present in a topic, the consumer should configure auto.offset.reset to earliest.
Latest — This is the default offset reset value if you have not configured any. When the consumer application is initialized the first time and the consumer application starts with “latest” as offset reset, the consumer application receives the messages that arrived to the topic after it subscribed to the topic or from the last committed offset when the consumer re-joins the cluster.
None — “none” if you would rather set the initial offset yourself and you are willing to handle out of range errors manually.

Let me come back to the problem again. Our consumers are there for quite some time handling thousands of TPS successfully in production. But in non-production environments, consumer applications might go down during weekends or could be due to some unforeseen reasons. So when the issue was reported, we observed that the consumer applications were down for more than a day, and when applications were back again, they did not consume any of the messages available in the topic, and they started consuming the messages that came to the topic post at their restart.

After the investigation, we figured out that it is to do with the consumer group offset retention (offsets.retention.minutes) property configured in broker. The default value for this property is 1440 minutes (24 hours).

As I mentioned above, you might not experience this issue in production environment, as no application can enjoy 24 hours of downtime.

So if you have not configured the offsets.retention.minutes in broker property and when consumer application goes down for more than a day, and when it rejoins the cluster with offset.reset as latest, it joins as a new consumer group and starts consuming messages that arrived after it successfully rejoined the topic, resulting in messages loss. So it is important that we configure appropriate days like ‘n’ days based on the application nature and availability agreed with the business.

Kafka is an extremely good messaging system and provides a number of configurations for all three components such as producer, broker, and consumers. Though default values work for most of the use cases, it is important that we explore each configuration to set the right value based on our use cases.

Apache Kafka Consumer Group Offset Retention

This article provides a fix to a problem in building a Kafka cluster.

Further Reading

Partner Resources

Related

Trending

Apache Kafka Consumer Group Offset Retention

This article provides a fix to a problem in building a Kafka cluster.

Further Reading

Related

Partner Resources