DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Dust: Open-Source Actors for Java
  • Advanced Strategies for Building Modern Data Pipelines

Trending

  • Agile’s Quarter-Century Crisis
  • Building Reliable LLM-Powered Microservices With Kubernetes on AWS
  • Unit Testing Large Codebases: Principles, Practices, and C++ Examples
  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  1. DZone
  2. Coding
  3. Java
  4. How to Monitor Apache Flink With OpenTelemetry

How to Monitor Apache Flink With OpenTelemetry

Here's how to get Apache Flink monitoring support in the open-source OpenTelemetry collector.

By 
Jonathan Wamsley user avatar
Jonathan Wamsley
·
Mar. 01, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
5.5K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Flink monitoring support is now available in the open-source OpenTelemetry collector. You can check out the OpenTelemetry repo here! You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and other distributions of the collector.

Today we'll use observIQ’s OpenTelemetry distribution, and shipping Apache Flink telemetry to a popular backend: Google Cloud Ops. You can find out more on the GitHub page: https://github.com/observIQ/observiq-otel-collector

What Signals Matter?

Apache Flink is an open-source, unified batch processing and stream processing framework. The Apache Flink collector records 29 unique metrics, so there is a lot of data to pay attention to. Some specific metrics that users find valuable are:

  • Uptime and restarts
    • Two different metrics that record the duration a job has continued uninterrupted, and the number of full restarts a job has committed, respectively.
  • Checkpoints
    • A number of metrics monitoring checkpoints can tell you the number of active checkpoints, the number of completed and failed checkpoints, and the duration of ongoing and past checkpoints.
  • Memory Usage
    • Memory-related metrics are often relevant to monitor. The Apache Flink collector ships metrics that can tell you about total memory usage, both present and over time, mins and maxes, and how the memory is divided between different processes.

All of the above categories can be gathered with the Apache Flink receiver — so let’s get started.

Before You Begin

If you don’t already have an OpenTelemetry collector built with the latest Apache Flink receiver installed, you’ll need to do that first. The Collector distro we're using includes the Apache Flink receiver (and many others) and is simple to install with a one-line installer.

Configuring the Apache Flink Receiver

Navigate to your OpenTelemetry configuration file. If you’re following along, you’ll find it in the following location: 

  • /opt/observiq-otel-collector/config.yaml (Linux)

For the Collector, edit the configuration file to include the Apache Flink receiver as shown below:

YAML
 
receivers:
  flinkmetrics:
    endpoint: http://localhost:8081
    collection_interval: 10s

Processors:
  nop:
   # Resourcedetection is used to add a unique (host.name)
  # to the metric resource(s),...  target_key: namespace

exporters:
  nop:
    # Add the exporter for your preferred destination(s)

service:
  pipelines:
    metrics:
      receivers: [flinkmetrics]
      processors: [nop]
      exporters: [nop]

If you’re using the Google Ops Agent instead, you can find the relevant config file here.

Viewing the Metrics Collected

If you followed the steps detailed above, the following Apache Flink metrics will now be delivered to your preferred destination.

Metric Description
flink.jvm.cpu.load
The CPU usage of the JVM for a jobmanager or taskmanager.
flink.jvm.cpu.time
The CPU time used by the JVM for a jobmanager or taskmanager.
flink.jvm.memory.heap.used
The amount of heap memory currently used.
flink.jvm.memory.heap.committed
The amount of heap memory guaranteed to be available to the JVM.
flink.jvm.memory.heap.max
The maximum amount of heap memory that can be used for memory management.
flink.jvm.memory.nonheap.used
The amount of non-heap memory currently used.
flink.jvm.memory.nonheap.committed
The amount of non-heap memory guaranteed to be available to the JVM.
flink.jvm.memory.nonheap.max
The maximum amount of non-heap memory that can be used for memory management.
flink.jvm.memory.metaspace.used
The amount of memory currently used in the Metaspace memory pool.
flink.jvm.memory.metaspace.committed
The amount of memory guaranteed to be available to the JVM in the Metaspace memory pool.
flink.jvm.memory.metaspace.max
The maximum amount of memory that can be used in the Metaspace memory pool.
flink.jvm.memory.direct.used
The amount of memory used by the JVM for the direct buffer pool.
flink.jvm.memory.direct.total_capacity
The total capacity of all buffers in the direct buffer pool.
flink.jvm.memory.mapped.used
The amount of memory used by the JVM for the mapped buffer pool.
flink.jvm.memory.mapped.total_capacity
The number of buffers in the mapped buffer pool.
flink.memory.managed.used
The amount of managed memory currently used.
flink.memory.managed.total
The total amount of managed memory.
flink.jvm.threads.count
The total number of live threads.
flink.jvm.gc.collections.count
The total number of collections that have occurred.
flink.jvm.gc.collections.time
The total time spent performing garbage collection.
flink.jvm.class_loader.classes_loaded
The total number of classes loaded since the start of the JVM.
flink.job.restart.count
The total number of restarts since this job was submitted, including full restarts and fine-grained restarts.
flink.job.last_checkpoint.time
The end to end duration of the last checkpoint.
flink.job.last_checkpoint.size
The total size of the last checkpoint.
flink.job.checkpoint.count
The number of checkpoints completed or failed.
flink.job.checkpoint.in_progress
The number of checkpoints in progress.
flink.task.record.count
The number of records a task has.
flink.operator.record.count
The number of records an operator has.
flink.operator.watermark.output
The last watermark this operator has emitted.

This OpenTelemetry collector can help companies looking to implement OpenTelemetry standards. 

Apache Flink Java virtual machine Open source Memory (storage engine) Batch processing Event monitoring Stream processing

Published at DZone with permission of Jonathan Wamsley. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Dust: Open-Source Actors for Java
  • Advanced Strategies for Building Modern Data Pipelines

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!