DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • System Coexistence: Bridging Legacy and Modern Architecture
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • How to Integrate Event-Driven Ansible With Kafka

Trending

  • Ensuring Configuration Consistency Across Global Data Centers
  • Next-Gen IoT Performance Depends on Advanced Power Management ICs
  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Don't Use Apache Kafka Consumer Groups the Wrong Way!

Don't Use Apache Kafka Consumer Groups the Wrong Way!

Apache Kafka is great — but if you're going to use it, you have to be very careful not to break things. Here's how you can avoid the pain!

By 
Paolo Patierno user avatar
Paolo Patierno
·
Jul. 30, 17 · Tutorial
Likes (20)
Comment
Save
Tweet
Share
274.6K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog post, I’d like to focus the attention on how “automatic” and “manual” partition assignments can interfere with each other — and even break things. I’d like to give an advice on using them in the right way avoiding to mix them in the same scenario or being aware of what you are doing.

The Consumer Group Experience

In Apache Kafka, the consumer group concept is a way of achieving two things:

  1. Having consumers as part of the same consumer group means providing the“competing consumers” pattern with whom the messages from topic partitions are spread across the members of the group. Each consumer receives messages from one or more partitions (“automatically” assigned to it) and the same messages won’t be received by the other consumers (assigned to different partitions). In this way, we can scale the number of the consumers up to the number of the partitions (having one consumer reading only one partition); in this case, a new consumer joining the group will be in an idle state without being assigned to any partition.
  2. Having consumers as part of different consumer groups means providing the “publish/subscribe” pattern where the messages from topic partitions are sent to all the consumers across the different groups. It means that inside the same consumer group, we’ll have the rules explained above, but across different groups, the consumers will receive the same messages. It’s useful when the messages inside a topic are of interest for different applications that will process them in different ways. We want all the interested applications to receive all the same messages from the topic.

Another great advantage of consumers grouping is the rebalancing feature. When a consumer joins a group, if there are still enough partitions available (i.e. we haven’t reached the limit of one consumer per partition), a re-balancing starts and the partitions will be reassigned to the current consumers, plus the new one. In the same way, if a consumer leaves a group, the partitions will be reassigned to the remaining consumers.

What I have told so far it’s really true using the subscribe() method provided by the KafkaConsumer API. This method forces you to assign the consumer to a consumer group, setting the group.id property, because it’s needed for re-balancing. In any case, it’s not the consumer's choice to decide the partitions it wants to read for. In general, the first consumer joins the group doing the assignment while other consumers join the group.

How Things Can Be Broken

Other than using the subscribe() method, there is another way for a consumer to read from topic partitions: the assign() method. In this case, the consumer is able to specify the topic partitions it wants to read for.

This type of approach can be useful when you know exactly where some specific messages will be written (the partition) and you want to read directly from there. Of course, you lose the re-balancing feature in this case, which is the first big difference in using the subscribe method.

Another difference is that with “manual” assignment, you can avoid specifying a consumer group (i.e. the group.id property) for the consumer — it will be just empty. In any case, it’s better to specify it.

Most people use the subscribe method, leveraging the “automatic” assignment and re-balancing feature. Using both of these methods can break things, as we're about to see.

Imagine having a single “test” topic with only two partitions (P0 and P1) and a consumer C1 that subscribes to the topic as part of the consumer group G1. This consumer will be assigned to both the partitions receiving messages from them. Now, let’s start a new consumer C2 that is configured to be part of the same consumer group G1 but it uses the assign method to ask partitions P0 and P1 explicitly.

subscribe_assignNow we have broken something! ...but what is it?

Both C1 and C2 will receive messages from the topic from both partitions P0 and P1, but they are part of the same consumer group G1! So we have “broken” what we said in the previous paragraph about “competing consumers” when they are part of the same consumer group. You experience a “publish/subscribe” pattern, but with consumers within the same consumer group.

What About Offsets Commits?

Generally, you should avoid a scenario like the one described above. Starting from version 0.8.2.0, the offsets committed by the consumers aren’t saved in ZooKeeper but on a partitioned and replicated topic named __consumer_offsets, which is hosted on the Kafka brokers in the cluster.

When a consumer commits some offsets (for different partitions), it sends a message to the broker to the __consumer_offsets topic. The message has the following structure :

  • key = [group, topic, partition]
  • value = offset

Coming back to the previous scenario... what does it mean?

Having C1 and C2 as part of the same consumer group but being able to receive from the same partitions (both P0 and P1) would look something like the following:

  • C1 commits offset X for partition P0 writing a message like this:
    • key = [G1, “test”, P0], value = X
  • C2 commits offset Y for partition P0 writing a message like this:
    • key = [G1, “test”, P0], value = Y

C2 has overwritten the committed offset for the same partition P0 of the consumer C1 and maybe X was less than Y. If C1 crashes and restarts, it will lose messages starting to read from Y (remember Y > X).

Something like that can’t happen with consumers which use only the subscribe way for being assigned to partitions because as part of the same consumer group they’ll receive different partitions so the key for the offset commit message will be always different.

Update: As a confirmation that mixing subscribe and assign isn’t a good thing to do, after a discussion with one of my colleagues, Henryk Konsek, it turned out that if you try to call both methods on the same consumer, the client library throws the following exception:

java.lang.IllegalStateException: Subscription to topics, partitions and pattern are mutually exclusive

Conclusion

The consumer groups mechanism in Apache Kafka works really well. Leveraging it for scaling consumers and having “automatic” partitions assignment with rebalancing is a great plus. There are cases in which you would need to assign partitions “manually” but in those cases, pay attention to what could happen if you mix both solutions.

kafka

Published at DZone with permission of Paolo Patierno, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • System Coexistence: Bridging Legacy and Modern Architecture
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • How to Integrate Event-Driven Ansible With Kafka

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!