DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Trending

  • Boosting Application Performance With MicroStream and Redis Integration
  • 13 Impressive Ways To Improve the Developer’s Experience by Using AI
  • Batch Request Processing With API Gateway
  • Harnessing the Power of Integration Testing
  1. DZone
  2. Data Engineering
  3. Big Data
  4. 3 Key Metrics for Kafka Monitoring

3 Key Metrics for Kafka Monitoring

3 metrics that I found to be very useful from a development point of view and saved us some time while triaging a few corner cases.

Preetdeep Kumar user avatar by
Preetdeep Kumar
CORE ·
Sep. 21, 20 · Analysis
Like (2)
Save
Tweet
Share
2.80K Views

Join the DZone community and get the full member experience.

Join For Free

“Without data you’re just a person with an opinion.”
— W. Edwards Deming

There are 100s of metrics documented as part of Kafka monitoring out of which CPU, Memory, Disk, and Network related metrics are always useful in monitoring any systems. In this article, I share 3 metrics that I found to be very useful from a development point of view, saved us some time while triaging a few corner cases reported by customers.

Lag per Topic — To get alerted when your consumers are functioning slower than usual. A high value often indicates the existence of one or more following situations

(a) Spike in messages being produced, probably of long duration because short spikes usually get sorted out by consumer bringing lag per topic down after some time.

(b) Consumer processes not having enough system resources or waiting for blocking I/O or network call. In one case, one of our EC2 instances (running Kafka consumer) stopped polling for messages because JVM crashed due to OOM; however, the health check didn't replace it because it was monitoring the instance and not the application inside it.

If you observe some data points missing in the dashboard now but after some time it shows up then please add this metric in your monitoring dashboard. As a developer, you should get an estimate and set a threshold or else enable monitoring and have some historical data to come up with a baseline.

Consumer Offset Delta — (Derived metric) Often, a consumer might simply stop receiving messages even though it is running and connected to the topic. This derived metric (difference of current_ consumer_offset and previous_consumer_offset every 10/30 seconds or 1 minute) is the key to be alerted on following probable causes

(a) Messages not being produced or not at the rate expected

(b) Messages not being consumed or not at the rate expected

Consumer Timer — (Derived metric) I recommend developers to put a timer which goes off when (a) 0 messages (b) < x messages (c) > y messages, are received in a poll cycle. If this timer count breaches a threshold then raise an alert. This might sound difficult to implement initially but if you keep track of this metric then the dev team can easily identify a pattern that will help them improve code.

Any insights you can get out of your system helps in deriving strategy to optimize and cut costs. Monitoring is not just an Operational exercise but also part of Development processes. You can learn so much by observing first hand how the Kafka cluster (or other systems) functions in production.

kafka Metric (unit)

Opinions expressed by DZone contributors are their own.

Trending

  • Boosting Application Performance With MicroStream and Redis Integration
  • 13 Impressive Ways To Improve the Developer’s Experience by Using AI
  • Batch Request Processing With API Gateway
  • Harnessing the Power of Integration Testing

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: