DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Performance Engineering Management: A Quick Guide
  • Memory Optimization and Utilization in Java 25 LTS: Practical Best Practices
  • Optimizing Java Applications for Arm64 in the Cloud
  • When Memory Overflows: Too Many ApplicationContexts in Spring Integration Tests

Trending

  • Why DDoS Protection Is an Architectural Decision for Developers
  • OpenAPI From Code With Spring and Java: A Recipe for Your CI
  • From Indicators to Insights: Automating IOC Enrichment Using Python and Threat Feeds
  • LLM-Powered Deep Parsing for Industrial Inventory Search
  1. DZone
  2. Data Engineering
  3. Data
  4. Fixing a SOLR Memory Leak

Fixing a SOLR Memory Leak

Let's get things working again.

By 
Harish Kumar Murugesan user avatar
Harish Kumar Murugesan
·
Dec. 18, 21 · Analysis
Likes (27)
Comment
Save
Tweet
Share
7.8K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog, we are going to learn about memory leaks occurring in SOLR QueryResultCache, how the RCA was carried out, and the solution given to resolve the issue.

In the application under test, SOLR was used as a component to store, search, and retrieve the contents. SOLR 7.5 was used in this application. While conducting the performance testing, it was observed that the SOLR Slave CPU was increasing constantly for every test as given below:

  • Test 1 – 40% CPU usage
  • Test 2 – 60% CPU usage
  • Test 3 – 80% CPU usage

In the tests, only the contents were retrieved, and there was no write to the contents in the SOLR Slave server. The SOLR Slave server was not restarted between the tests, as it will not be in production. The CPU usage drill-down view in Dynatrace did not show any specific evidence of where the actual time was spent. In the GC graph, it was observed that the old generation memory was growing, and the time spent on the young GC was a little higher. The old generation size has grown to 9 GB. Frequent minor GCs were seen, with an average time to GC of around 850 milliseconds, with a few spikes in GC time that went up to 1.3 seconds. Since the old gen was growing, it was suspected that some objects were growing in the memory. When looking at the SOLR Cache in Dynatrace, it was observed that there were only inserts into QueryResultCache, but there was no eviction seen in the cache. The above tests were repeated after restarting the SOLR, and a similar behavior was observed. This time, QueryResultCache was monitored closely via SOLR console and increase in size of the cache was observed as given below:

  • Test 1 – 190K elements in cache
  • Test 2 – 350K elements in cache
  • Test 3 – 520K elements in cache

A heap dump was taken after these tests, and it was observed that around 8GB of the memory was occupied by FastLRUCache and its contents. While looking at the SOLR configuration, it was observed that the below settings were given for the QueryResultCache:

<queryResultCache class="solr.FastLRUCache"   size="5000"  initialSize="512" maxRamMB="1048" autowarmCount="0"/>

In this case, QueryResultCache was not honoring both ‘size’ and ‘maxRamMB’ parameters. It went beyond the values set. Cache size went to 520K, as opposed to the 5K size set, and it crossed 8GB in size in the heap dump, as opposed to the maximum of 1GB limit set. Instead of using both the parameters to limit the cache, it was decided to restrict the cache using the ‘size’ parameter.

The following settings were applied in SOLR:

<queryResultCache class="solr.FastLRUCache " size="150000" initialSize="512" autowarmCount="0"/>

Three tests were repeated, and it was observed that the CPU usage of SOLR was constant at around 40% in all three tests. Also, the QueryResultCache grew to 150K size, and, after that, evictions were seen in the cache by restricting the size within the defined limit. The old generation memory remained constant, and the time to garbage collect the young generation also came down. Old generation size grew to 3 GB and was consistent across the tests. The average time to minor GC was around 650 milliseconds without any spikes. The test result was successful without any CPU issues in SOLR.

Based on the root cause analysis, it looks like there is a leak in the SOLR cache if both the ‘size’ and ‘maxRamMB’ are set in SOLR for QueryResultCache. Setting just the ‘size’ parameter for the QueryResultCache avoids this memory leak.

Memory (storage engine) garbage collection Testing Cache (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Performance Engineering Management: A Quick Guide
  • Memory Optimization and Utilization in Java 25 LTS: Practical Best Practices
  • Optimizing Java Applications for Arm64 in the Cloud
  • When Memory Overflows: Too Many ApplicationContexts in Spring Integration Tests

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook