DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Monitoring Kafka Consumer Lag

Monitoring Kafka Consumer Lag

Burrow is on of the tools available for Monitoring Kafka. One of the key things to monitor is the lag in Kafka Consumer intake of messages.

David Brinegar user avatar by
David Brinegar
·
Nov. 22, 16 · Opinion
Like (4)
Save
Tweet
Share
8.38K Views

Join the DZone community and get the full member experience.

Join For Free

With any new and fast moving technology stack such as Kafka, monitoring, and operational tools are often a step behind or missing significant functionality. But we do have a couple of robust open source projects which are available and can be made to work in specific circumstances. One such tool is Burrow from LinkedIn, written in Go, another is Kafka Manager from Yahoo, written in Scala, both of which are active projects and relied upon in significant and complex Kafka environments. There are others as well such as KafkaOffsetMonitor, but be careful with projects that haven’t updated in a long time as they are probably targeting Kafka v0.8 and not compatible with newer versions of Kafka. Version support is a big problem for Consumer Group monitoring because what works for one v0.8 application might not work for a v0.10 application.

In our last blog, we discussed details about what Consumer Lag is. In this blog, we will discuss the limitations of some of the open source alternatives.

Pros and Cons of Using Burrow

Let’s take Burrow as an example. I’m going to put my finger on a couple of sore spots which one might find in any software, but let me also say that Burrow is high-quality software and a solid first generation monitoring solution for Kafka. Burrow was originally written for the first Kafka v0.8 Consumer Group reference implementation, which used Zookeeper for metadata, and naturally was built for the behaviors one might see in that first Consumer Group implementation. Offsets are tracked by running a Kafka client to read each consumer offset partition continuously, which then counts the number of messages consumed by each client. This surprised me when I first dug into why Burrow was so busy, so I’ll repeat it: Burrow creates one Kafka client for every partition being monitored. If you have a thousand partitions being consumed by various Consumer Groups, Burrow will start up and run a thousand Kafka clients. I’m sure there are good reasons for doing this instead of using the OffsetFetchRequest API periodically–perhaps this way offers more intermediate data points and so greater precision and granularity control–but the relative cost is quite high. As a caution, you’ll have to be careful about Burrow capacity, watching for lag on these internal clients, and the impact on the rest of Kafka.

Kafka Manager Works Similarly and Cautions:

Kafka managed consumer offset is now consumed by KafkaManagedOffsetCache from the “__consumer_offsets” topic. Note, this has not been tested with a large number of offsets being tracked. There is a single thread per cluster consuming this topic so it may not be able to keep up on large # of offsets being pushed to the topic.

Home Grown or DIY Monitoring

Another challenge with operating your own monitoring stack for Kafka is integration. First you have to manage the integration with different generations of applications. If some applications store offsets in Zookeeper, while others store offsets in a Coordinator, you’ll need two separate configuration instances in Burrow. And as new versions of Kafka come out, and new applications deployed, these moving targets will eventually require you to update your Burrow instance to remain compatible. And of-course you must operate and integrate Burrow’s built-in notification system to your alerting system. There isn’t much that can be done to integrate the UI. Something like Kafka Manager gives you a single view of a cluster but doesn’t give you, say, Zookeeper monitoring, or host monitoring. So integration of these views is left as an exercise in customized integration.

Burrow Health Monitor and False Negatives

Finally, I will point out that the Burrow health monitor–the thing that says if lag is getting worse–is extremely conservative. It requires a string of consecutive lag increases, say 10 increases in a row, for Burrow to alert you to the Consumer Group becoming slow. I say this is conservative because while it certainly covers one case, it does not catch a lagging Consumer Group that reads bursts of messages. Perhaps it was designed this way to avoid false positives in practice–none could argue that lag getting continuously worse is indeed a problem. But if you use the newer Simple Consumer Group reference implementation, it can be quite difficult or close to impossible to trigger this required condition, because the Consumer Group has a burst fetch at the heart of its message loop. If you have a slow application, this burst fetch is very likely to produce occasional lag measurements where lag is better than the moment before, even though the overall trend is worse. Say for example the group has the following lag: 100 messages behind, then 200, 190, 300, 400, 390, 500, 600… An operator looking at that graph would say the Consumer Group is falling behind. A line drawn through the data points would show the noisy upward trend. Lag is growing, although there are these occasional bursts where lag slowed down briefly relative to the previous measurement. But in this example, Burrow will report that this Consumer Group is in good health because in one place lag went from 200 down to 190. Think of lag acceleration as the model to see Burrow’s perspective. Lag accelerates +100, +100, -10, +110, +100, -10… Any slow down in acceleration will look like progress to Burrow. And it will continue to say things are fine, even as lag grows a million fold, so long as some measurement didn’t accelerate.

In the next part of this blog, we will discuss how OpsClarity addresses the above limitations and provides a comprehensive solution to monitor Kafka and Kafka Consumer Lag.

kafka

Published at DZone with permission of David Brinegar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • [DZone Survey] Share Your Expertise and Take our 2023 Web, Mobile, and Low-Code Apps Survey
  • Stress Testing Tutorial: Comprehensive Guide With Best Practices
  • How To Create a Failover Client Using the Hazelcast Viridian Serverless
  • Microservices 101: Transactional Outbox and Inbox

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: