DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Evaluating Similariy Digests: A Study of TLSH, ssdeep, and sdhash Against Common File Modifications
  • Understanding IEEE 802.11(Wi-Fi) Encryption and Authentication: Write Your Own Custom Packet Sniffer
  • How to Implement Linked Lists in Go
  • Recursive Feature Elimination in Practice

Trending

  • Exploring Reactive and Proactive Observability in the Modern Monitoring Landscape
  • Want to Become a Senior Software Engineer? Do These Things
  • DevOps in the Cloud - How to Streamline Your CI/CD Pipeline for Multinational Teams
  • TFVC to Git Migration: Step-by-Step Guide for Modern DevOps Teams
  1. DZone
  2. Data Engineering
  3. Data
  4. HashMap.get High CPU – Case Study

HashMap.get High CPU – Case Study

By 
Pierre - Hugues Charbonneau user avatar
Pierre - Hugues Charbonneau
·
Dec. 27, 12 · Interview
Likes (1)
Comment
Save
Tweet
Share
8.3K Views

Join the DZone community and get the full member experience.

Join For Free

 


This article will describe the complete root cause analysis and solution of a HashMap High CPU problem (infinite looping) affecting a Weblogic 10.0 environment running on the Java HotSpot VM 1.5.

This case study will again demonstrate this importance of mastering Thread Dump analysis skill and CPU correlation techniques such as Solaris prstat.

Environment specifications

  •         Java EE server: Oracle Weblogic Portal 10.0
  •         Middleware OS: Solaris 10
  •         Java VM: Java HotSpot VM 1.5
  •         Platform type: Portal application

Monitoring and troubleshooting tools

  •         JVM Thread Dump (HotSpot format)
  •         Solaris prstat (CPU contributors analysis)

Problem overview
Problem type: High CPUobserved from our Weblogic production environment
A high CPU problem was observed from our Solaris physical servers hosting a Weblogic Portal 10 environment. Users also reporting major slowdown of the portal application.

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non-technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

  •         What is the client impact? HIGH
  •         Recent change of the affected platform? No
  •         Any recent traffic increase to the affected platform? Yes
  •         How does this high CPU manifest itself?  A sudden CPU increase was observed and is not going down; even after load goes down e.g. near zero level.
  •         Did an Oracle OSB recycle resolve the problem? Yes, but problem is returning after few hours or few days (unpredictable pattern)

-        Conclusion #1: The high CPU problem appears to be intermittent vs. pure correlation with load
-        Conclusion #2: Since high CPU remains after load goes down, this typically indicates either the presence of some infinite looping or heavy processing Threads

Solaris CPU analysis using prstat

Solaris prstat is a powerful OS command allowing you to obtain the CPU per process but more importantly CPU per Thread within a process. As you can see below from our case study, the CPU utilization was confirmed to go up as high as 100% utilization (saturation level).

## PRSTAT (CPU per Java Thread analysis)
prstat -L -p 8223 1 1
PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME      CPU  PROCESS/LWPID
8223 bea10      2809M 2592M sleep   59    0  14:52:59        38.6%java/494
8223 bea10      2809M 2592M sleep   57    0   12:28:05       22.3% java/325
8223 bea10      2809M 2592M sleep   59    0   11:52:02       28.3% java/412
8223 bea10      2809M 2592M sleep   59    0   5:50:00         0.3% java/84
8223 bea10      2809M 2592M sleep   58    0   2:27:20         0.2% java/43
8223 bea10      2809M 2592M sleep   59    0   1:39:42         0.2% java/41287
8223 bea10      2809M 2592M sleep   59    0   4:41:44         0.2% java/30503
8223 bea10      2809M 2592M sleep   59    0   5:58:32         0.2% java/36116
……………………………………………………………………………………
As you can see from above data, 3 Java Threads were found using together close to 100% of the CPU utilization.

For our root cause analysis, we did focus on Thread #494 (decimal format) corresponding to 0x1ee (HEXA format).

Thread Dump analysis and PRSTAT correlation

Once the culprit Threads were identified, the next step was to correlate this data with the Thread Dump data (which was captured exactly at the same time as prstat).

A quick search within the generated Thread Dump file did reveal the Thread Stack Trace (Weblogic Stuck Thread #125) for 0x1ee as per below.
 "[STUCK] ExecuteThread: '125' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x014c5030 nid=0x1ee runnable [0x536fb000..0x536ffc70]
       at java.util.HashMap.get(HashMap.java:346)
       at org.apache.axis.encoding.TypeMappingImpl.getClassForQName(TypeMappingImpl.java:715)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:170)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:160)
       at org.apache.axis.encoding.TypeMappingImpl.getDeserializer(TypeMappingImpl.java:454)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:108)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:102)
       at org.apache.axis.encoding.DeserializationContext.getDeserializer(DeserializationContext.java:457)
       at org.apache.axis.encoding.DeserializationContext.getDeserializerForType(DeserializationContext.java:547)
       at org.apache.axis.encoding.ser.BeanDeserializer.getDeserializer(BeanDeserializer.java:514)
       at org.apache.axis.encoding.ser.BeanDeserializer.onStartChild(BeanDeserializer.java:286)
       at org.apache.axis.encoding.DeserializationContext.startElement(DeserializationContext.java:1035)
       at org.apache.axis.message.SAX2EventRecorder.replay(SAX2EventRecorder.java:165)
       at org.apache.axis.message.MessageElement.publishToHandler(MessageElement.java:1141)
       at org.apache.axis.message.RPCElement.deserialize(RPCElement.java:236)
       at org.apache.axis.message.RPCElement.getParams(RPCElement.java:384)
       at org.apache.axis.client.Call.invoke(Call.java:2467)
       at org.apache.axis.client.Call.invoke(Call.java:2366)
       at org.apache.axis.client.Call.invoke(Call.java:1812)
Thread Dump analysis – HashMap.get() infinite loop condition!

As you can see from the above Thread Stack Trace, the Thread is currently stuck in an infinite loop over a java.util.HashMap that originates from the Apache Axis TypeMappingImpl Java class.

This finding was quite revealing. The 2 others Threads using high CPU also did reveal infinite looping condition within the same Apache Axis HashMap Object.

Root cause: non Thread safe HashMap in Apache Axis 1.4

Additional research did reveal this known defect affecting Apache Axis 1.4; which is the version that our application was using.

As you may already know, usage of non Thread safe / non synchronized HashMap under concurrent Threads condition is very dangerous and can easily lead to internal HashMap index corruption and / or infinite looping. This is also a golden rule for any middleware software such as Oracle Weblogic, IBM WAS, Red Hat JBoss which rely heavily on HashMap data structures from various Java EE and caching services.

Such best practice is also applicable for any Open Source third party API such as Apache Axis.

The most common solution is to use the ConcurrentHashMap data structure which is designed for that type of concurrent Thread execution context.

Solution

Our team did apply the proposed patch from Apache (synchronize the non Thread safe HashMap) which did resolve the problem. We are also currently looking at upgrading our application to a  newer version of Apache Axis.

Conclusion

I hope this case study has helped you understand how to pinpoint the root cause of high CPU Threads and the importance of proper Thread safe data structure for high concurrent Thread / processing applications.

Please don’t hesitate to post any comment or question.

Find office supplies promo codes to save money for your department's bottom line, so you can spend more on the latest software.
Data structure

Published at DZone with permission of Pierre - Hugues Charbonneau, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Evaluating Similariy Digests: A Study of TLSH, ssdeep, and sdhash Against Common File Modifications
  • Understanding IEEE 802.11(Wi-Fi) Encryption and Authentication: Write Your Own Custom Packet Sniffer
  • How to Implement Linked Lists in Go
  • Recursive Feature Elimination in Practice

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: