DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Ulyp: Recording Java Execution Flow for Faster Debugging
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • How to Convert XLS to XLSX in Java
  • Automatic Code Transformation With OpenRewrite

Trending

  • Issue and Present Verifiable Credentials With Spring Boot and Android
  • How to Practice TDD With Kotlin
  • Java Virtual Threads and Scaling
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Effective Methods to Diagnose and Troubleshoot CPU Spikes in Java Applications

Effective Methods to Diagnose and Troubleshoot CPU Spikes in Java Applications

In this article, discover a few practical methods to help diagnose and resolve CPU spikes without making changes in your production environment.

By 
Ram Lakshmanan user avatar
Ram Lakshmanan
DZone Core CORE ·
Nov. 05, 24 · Tutorial
Likes (12)
Comment
Save
Tweet
Share
13.4K Views

Join the DZone community and get the full member experience.

Join For Free

CPU spikes are one of the most common performance challenges faced by Java applications. While traditional APM (Application Performance Management) tools provide high-level insights into overall CPU usage, they often fall short of identifying the root cause of the spike. APM tools usually can’t pinpoint the exact code paths causing the issue. This is where non-intrusive, thread-level analysis proves to be much more effective. In this post, I’ll share a few practical methods to help you diagnose and resolve CPU spikes without making changes in your production environment.

Intrusive vs Non-Intrusive Approach: What Is the Difference?

Intrusive Approach

Intrusive approaches involve making changes to the application’s code or configuration, such as enabling detailed profiling, adding extra logging, or attaching performance monitoring agents. These methods can provide in-depth data, but they come with the risk of affecting the application’s performance and may not be suitable for production environments due to the added overhead.

Non-Intrusive Approach

Non-intrusive approaches, on the other hand, require no modifications to the running application. They rely on gathering external data such as thread dumps, CPU usage, and logs without interfering with the application’s normal operation. These methods are safer for production environments because they avoid any potential performance degradation and allow you to troubleshoot live applications without disruption. 

1. top -H + Thread Dump

High CPU consumption is always caused by the threads that are continuously applying code. Our application tends to have hundreds (sometimes thousands) of threads. The first step in diagnosis is to identify CPU-consuming threads from these hundreds of threads. 

A simple and effective way to do this is by using the top command. The top command is a utility available on all flavors of Unix systems that provides a real-time view of system resource usage, including CPU consumption by each thread in a specific process. You can issue the following top command to identify which threads are consuming the most CPU:

top -H -p <PROCESS_ID>


This command lists individual threads within a Java process and their respective CPU consumption, as shown in Figure 1 below:

top -H -p <PROCESS_ID> command showing threads and their CPU consumption

Figure 1: top -H -p <PROCESS_ID> command showing threads and their CPU consumption

Once you’ve identified the CPU-consuming threads, the next step is to figure out what lines of code those threads are executing. To do this, you need to capture a thread dump from the application, which will show the code execution path of those threads. However, there are a couple of things to keep in mind:

  1. You need to issue the top -H -p <PROCESS_ID> command and capture the thread dump simultaneously to know the exact lines of code causing the CPU spike. CPU spikes are transient, so capturing both at the same time ensures you can correlate the high CPU usage with the exact code being executed. Any delay between the two can result in missing the root cause.
  2. The top -H -p <PROCESS_ID> command prints thread IDs in decimal format, but in the thread dump, thread IDs are in hexadecimal format. You’ll need to convert the decimal Thread IDs to hexadecimal to look them up in the dump.
yCrash reporting CPU consumption by each thread and their code execution Path

Figure 2: yCrash reporting CPU consumption by each thread and their code execution Path

Disadvantages

This is the most effective and accurate method to troubleshoot CPU spikes. However, in certain environments, especially containerized environments, the top command may not be installed. In such cases, you might want to explore the alternative methods mentioned below.

2. RUNNABLE State Threads Across Multiple Dumps

Java threads can be in several states: NEW, RUNNABLE, BLOCKED, WAITING, TIMED_WAITING, or TERMINATED. If you are interested, you may learn more about different Thread States. When a thread is actively executing code, it will be in the RUNNABLE state. CPU spikes are always caused by threads in the RUNNABLE state. To effectively diagnose these spikes:

  1. Capture 3-5 thread dumps at intervals of 10 seconds.
  2. Identify threads that remain consistently in the RUNNABLE state across all dumps.
  3. Analyze the stack traces of these threads to determine what part of the code is consuming the CPU.

While this analysis can be done manually, thread dump analysis tools like fastThread automate the process. fastThread generates a "CPU Spike" section that highlights threads that were persistently in the RUNNABLE state across multiple dumps. However, this method won’t indicate the exact percentage of CPU each thread is consuming.

fastThread tool reporting ‘CPU spike’ section

Figure 3: fastThread tool reporting "CPU spike" section

Disadvantages

This method will show all threads in the RUNNABLE state, regardless of their actual CPU consumption. For example, threads consuming 80% of CPU and threads consuming only 5% will both appear. It wouldn’t provide the exact CPU consumption of individual threads, so you may have to infer the severity of the spike, based on thread behavior and execution patterns.

3. Analyzing RUNNABLE State Threads From a Single Dump

Sometimes, you may only have a single snapshot of a thread dump. In such cases, the approach of comparing multiple dumps can’t be applied. However, you can still attempt to diagnose CPU spikes by focusing on the threads in the RUNNABLE state. One thing to note is that the JVM classifies all threads running native methods as RUNNABLE, but many native methods (like java.net.SocketInputStream.socketRead0()) don’t execute code and instead just wait for I/O operations.

To avoid being misled by such threads, you’ll need to filter out these false positives and focus on the actual RUNNABLE state threads. This process can be tedious, but fastThread automates it by filtering out these misleading threads in its "CPU Consuming Threads" section, allowing you to focus on the real culprits behind the CPU spike.

fastThread tool reporting ‘CPU Consuming Threads’ section
Figure 4: fastThread tool reporting "CPU Consuming Threads" section

Disadvantages

This method has a couple of disadvantages:

  1. A thread might be temporarily in the RUNNABLE state but may quickly move to WAITING or TIMED_WAITING (i.e., non-CPU-consuming states). In such cases, relying on a single snapshot may lead to misleading conclusions about the thread’s impact on CPU consumption.
  2. Similar to method #2, it will show all threads in the RUNNABLE state, regardless of their actual CPU consumption. For example, threads consuming 80% of CPU and threads consuming only 5% will both appear. It wouldn’t provide the exact CPU consumption of individual threads, so you may have to infer the severity of the spike, based on thread behavior and execution patterns.

Case Study: Diagnosing CPU Spikes in a Major Trading Application

In one case, a major trading application experienced severe CPU spikes, significantly affecting its performance during critical trading hours. By capturing thread dumps and applying the method #1 discussed above, we identified that the root cause was the use of a non-thread-safe data structure. Multiple threads were concurrently accessing and modifying this data structure, leading to excessive CPU consumption. Once the issue was identified, the development team replaced the non-thread-safe data structure with a thread-safe alternative, which eliminated the contention and drastically reduced CPU usage. For more details on this case study, read more here.

Conclusion

Diagnosing CPU spikes in Java applications can be challenging, especially when traditional APM tools fall short. By using non-intrusive methods like analyzing thread dumps and focusing on RUNNABLE state threads, you can pinpoint the exact cause of the CPU spike.

State Threads Tool Dump (program) Java (programming language) Spike (software development)

Published at DZone with permission of Ram Lakshmanan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Ulyp: Recording Java Execution Flow for Faster Debugging
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • How to Convert XLS to XLSX in Java
  • Automatic Code Transformation With OpenRewrite

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!