We have recently passed the “half a billion root causes for poor user experience discovered” milestone. To celebrate this, we decided it is time to share the data we have gathered while detecting the root causes for the poorly performing applications.
To understand the data set exposed, you should have some understanding of what we do. Plumbr is keeping an eye on all end user interactions with Java applications. Whenever such an interaction is either too slow or fails altogether, Plumbr exposes the exact root cause in source code responsible for the problem. Examples of such root causes include slow database queries, synchronization issues or blocking file system accesses.
The dataset we analyzed was extracted from the root causes detected in 1,020 different environments Plumbr was monitoring during May to August 2016.
The first exposure of the dataset lists the number of times a particular root cause was the reason why end user either faced performance or availability issues:
From the above it is, for example, visible that:
- Web service access over HTTP calls was the source for poor performance in 26.5% of the root causes analyzed.
- Synchronization issues resulting in long locks were the second most popular culprit, responsible for ~15% of the root causes.
- Slow JDBC operations ranked third, just barely behind the locking issues.
As this representation of the data is biased towards larger deployments where Plumbr was monitoring clustered applications, let’s look at a different representation of the data. The chart below answers the question “In how many unique accounts was this particular root cause impacting end user experience at least once”
From the above, we can see for example that:
- Too long GC pauses were impacting end users in more than 65% of the accounts.
- Locking issues in poorly designed synchronization blocks were detected in around 60% of the accounts.
- Streaming operations using file system were detected as root causes in 11% of the accounts.
- Lucene indexes were either infrequently used or rather well built, being the source for performance issues in under 2% of the accounts.
I hope this gives you exposure to a rather interesting view of the different ways the Java-based applications are failing to meet the performance or availability requirements. For those who want to understand the data exposed above in more details, check out all the root causes Plumbr detects, to understand what the different columns in the charts above are all about.