DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Migrate Data Across Kafka Cluster Using mirrormaker2 in Strimzi
  • The Evolution of Adaptive Frameworks
  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • How to Design Event Streams, Part 1

Trending

  • Issue and Present Verifiable Credentials With Spring Boot and Android
  • How to Practice TDD With Kotlin
  • Java Virtual Threads and Scaling
  • A Modern Stack for Building Scalable Systems
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Understanding the Lag in Your Kafka Cluster

Understanding the Lag in Your Kafka Cluster

Kafka powers compelling consumer experiences in the companies. Consumer lag is a big challenge in Kafka. Understand and address consumer lag in Kafka.

By 
Rohit Choudhary user avatar
Rohit Choudhary
·
Updated Jun. 14, 21 · Opinion
Likes (4)
Comment
Save
Tweet
Share
16.8K Views

Join the DZone community and get the full member experience.

Join For Free

Amongst various metrics that Kafka monitoring includes consumer lag is nearly the most important of them all. In this post, we will explore potential reasons for Kafka consumer lag and what you could do when you experience lag.

Kafka — Past and Present

Apache Kafka is no longer used just by the Internet hyperscalers. Apache Kafka is used in the enterprise to deal with exploding streaming data. It powers compelling consumer experiences such as real-time personalization, recommendation and next best action. Kafka allows low latency ingestion of large amounts of data into data lakes or data warehouses. Kafka allows businesses to get real-time intelligence into their business operations that allows them to react in real time to changing business conditions.

Mission-critical business processes are plagued by consumer lags, and experienced practitioners agree that preventing consumer lag is the biggest challenge in Kafka.

How Does Kafka Work?

Kafka is a distributed, partitioned, replicated commit log service. Kafka is run as a cluster of multiple servers or containers.

The cluster stores streams of records in categories called topics with each record consisting of a key, value and a timestamp. Kafka Producers are processes that publish data into Kafka topics, while Consumers are processes that read messages off a Kafka topic.

Topics are divided into partitions which contain messages in an append-only sequence. Each message in a partition is assigned and identified by its unique offset. Partitions can hold multiple partition logs allowing consumers to read from in parallel.

These partitions are replicated across multiple Kafka clusters for resilience. A variety of applications could produce data and send them towards the Kafka broker. Consumers are applications that read messages from such Kafka brokers.

Consumers read messages from a specific offset and are allowed to read from any offset point they choose. Consumer groups include a set of consumer processes subscribing to a given topic.
Each consumer group is assigned a set of partitions to consume from.

They will receive messages from a different subset of the partitions in the topic. Kafka guarantees that the message is only read by a single consumer in the group. This philosophy is called exactly once delivery.

What Is Kafka Consumer Lag?

Consumer lag indicates the lag between Kafka producers and consumers.

If the rate of production of data far exceeds the rate at which it is getting consumed, consumer groups will exhibit lag. It can be understood very succinctly as the gap between the difference between the latest offset and consumer offset.

In general, enterprises talk about Kafka but they are referring to the physical Kafka brokers – a server either physical or container that runs Kafka. Brokers are the physical repositories of logs that store and serve Kafka messages.

Data storage inside a Kafka broker is done through topics. Topics are divided into partitions and brokers write data into specific partitions. As the broker writes data – it keeps track of the last offset and records it as the log end offset.

Kafka Consumers

Consumers on the other end may have complex application logic embedded inside the consumer processes. If there are way too many producers writing data to the same topic when there are a limited number of consumers then then the reading processes will always be slow.

The real-time objectives are lost.

So just like multiple producers which can write to the same topic, multiple consumers can read from the same topic, by getting data from one of the partitions inside the topic. It is common for consumer groups to have equal numbers of consumers as partitions, since they are doing low-latency operations. Good design includes the creation of a large number of partitions and is a fundamental way of scaling.

Just like Brokers keep track of their write position in each Partition, each Consumer keeps track of “read position” in each partition whose data it is consuming. It is the only way to keep track of the data that it has read, this is periodically persisted to Zookeeper or a Kafka Topic itself.

It’s possible that some consumer groups exhibit more lag than others, because they may have more complex logic. It can also occur because of stuck consumers, slow message processing, incrementally more messages produced than consumed.

Rebalance events can also be unpleasant contributors to consumer lag. In real-time conditions, new addition of new consumers to the consumer group causes partition ownership to change — this is helpful if it’s done to increase parallelism.

However, such changes are undesirable when triggered due to a consumer process crashing down. During this event, consumers can’t consume messages, and hence consumer lag occurs. Also, when partitions are moved from one consumer to another, the consumer loses its current state including caches.

Monitoring Tools

There are several Kafka monitoring tools both in the open-source community and commercially that  allow enterprises to scale technology adoption without worrying about operational blindness.

kafka cluster Data (computing)

Published at DZone with permission of Rohit Choudhary. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Migrate Data Across Kafka Cluster Using mirrormaker2 in Strimzi
  • The Evolution of Adaptive Frameworks
  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • How to Design Event Streams, Part 1

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!