DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Java and Low Latency
  • Apache RocketMQ: How We Lowered Latency
  • Demystifying Project Loom: A Guide to Lightweight Threads in Java
  • Parallelism in ConcurrentHashMap

Trending

  • Distributed Tracing Best Practices
  • How To Handle Technical Debt in Scrum
  • Build Quicker With Zipper: Building a Ping Pong Ranking App Using TypeScript Functions
  • How To Validate Archives and Identify Invalid Documents in Java
  1. DZone
  2. Coding
  3. Java
  4. Profiling the JVM on Linux: A Hybrid Approach

Profiling the JVM on Linux: A Hybrid Approach

Many Java sampling profilers have been known to blatantly misrepresent reality. That being said, your tools might be lying to you!

Sasha Goldshtein user avatar by
Sasha Goldshtein
·
Jul. 10, 17 · Tutorial
Like (14)
Save
Tweet
Share
15.06K Views

Join the DZone community and get the full member experience.

Join For Free

i hope you’re outraged that your performance tools are lying to you. for quite a while, many java sampling profilers have been known to blatantly misrepresent reality . in a nutshell, stack sampling using the documented jvmti getstacktrace method produces results that are biased towards safe points and are not representative of the real cpu processing performed by your program.

over the years, alternative profilers popped up, trying to fix this problem by using asyncgetcalltrace , a less-documented api that doesn’t wait for a safe point and can produce more accurate results. simply calling agct from a timer signal handler gives you a fairly reliable way to do stack sampling of jvm processes. unfortunately, even agct can sometimes fail, and in any case, it doesn’t help with profiling the non-java parts of your process: jvm code, gc, jit, syscalls, kernel work performed on your behalf, and really anything else that’s not pure jvm bytecode.

another popular alternative is using linux perf , which doesn’t directly support java but has great support for profiling native code, and doesn’t have any trouble looking at kernel stacks as well. for jvm support, you need two pieces :

  1. a perf map that maps jit-compiled addresses to function names (as a corollary, only compiled frames are supported; interpreter frames are invisible).
  2. a jit switch -xx:+preserveframepointer that makes sure perf can walk the java stack, added in openjdk 1.8u60.

when using this method:

  1. you end up losing interpreter frames.
  2. you can’t profile an older jvm that doesn’t have the preserveframepointer flag.
  3. you risk having stale entries in your perf map because the jit can throw away and recompile code.
  4. you risk not having certain functions in your perf map because the jit threw the code away.

at jpoint 2017 , andrei pangin and vadim tsesko from odnoklassniki introduced a new approach for jvm profiling on linux , which brings together the best from both worlds: perf for native code and kernel frames, and agct for java frames. thus, async-profiler was born.

async-profiler’s method of operation is fairly simple. it uses the perf_events api to configure cpu sampling into a memory buffer and asks for a signal to be delivered when a sample occurs. the signal handler then calls asyncgetcalltrace , and merges the two stacks together: the java stack, captured by asyncgetcalltrace , and the native plus kernel stack, captured by perf_events . for non-java threads, only the perf_events stack is retained.

async-profiler’s approach for constructing a merged call stack, from andrei pangin’s and vadim tsesko’s presentation at jpoint 2017.

this approach has its limitations, but it also offers a lot of appeal. you don’t need a special switch to preserve frame pointers. you get full-fidelity data about interpreter frames. the agent supports older jvms. the stack aggregation happens in the agent, so there are no expensive perf.data files to store and parse.

a flame graph generated by using async-profiler.

to try async-profiler, you can build from the source (it’s very simple) and then use the helper profiler.sh script, which i contributed:

./profiler.sh start $(pidof java)
./profiler.sh stop -o flamegraph -f /tmp/java.stacks

full instructions are in the readme — any feedback, contributions, or suggestions are very welcome. odnoklassniki is using this in production, but i’m sure they’ll be delighted to know that you found it useful, too!

Java (programming language) Java virtual machine Linux (operating system)

Opinions expressed by DZone contributors are their own.

Related

  • Java and Low Latency
  • Apache RocketMQ: How We Lowered Latency
  • Demystifying Project Loom: A Guide to Lightweight Threads in Java
  • Parallelism in ConcurrentHashMap

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: