DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Performance Engineering Management: A Quick Guide
  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Node.js Performance Tuning: Advanced Techniques to Follow

Trending

  • Memory-Optimized Tables: Implementation Strategies for SQL Server
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Simpler Data Transfer Objects With Java Records
  • When Airflow Tasks Get Stuck in Queued: A Real-World Debugging Story
  1. DZone
  2. Data Engineering
  3. Data
  4. Fixing a SOLR Memory Leak

Fixing a SOLR Memory Leak

Let's get things working again.

By 
Harish Kumar Murugesan user avatar
Harish Kumar Murugesan
·
Dec. 18, 21 · Analysis
Likes (25)
Comment
Save
Tweet
Share
7.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog, we are going to learn about memory leaks occurring in SOLR QueryResultCache, how the RCA was carried out, and the solution given to resolve the issue.

In the application under test, SOLR was used as a component to store, search, and retrieve the contents. SOLR 7.5 was used in this application. While conducting the performance testing, it was observed that the SOLR Slave CPU was increasing constantly for every test as given below:

  • Test 1 – 40% CPU usage
  • Test 2 – 60% CPU usage
  • Test 3 – 80% CPU usage

In the tests, only the contents were retrieved, and there was no write to the contents in the SOLR Slave server. The SOLR Slave server was not restarted between the tests, as it will not be in production. The CPU usage drill-down view in Dynatrace did not show any specific evidence of where the actual time was spent. In the GC graph, it was observed that the old generation memory was growing, and the time spent on the young GC was a little higher. The old generation size has grown to 9 GB. Frequent minor GCs were seen, with an average time to GC of around 850 milliseconds, with a few spikes in GC time that went up to 1.3 seconds. Since the old gen was growing, it was suspected that some objects were growing in the memory. When looking at the SOLR Cache in Dynatrace, it was observed that there were only inserts into QueryResultCache, but there was no eviction seen in the cache. The above tests were repeated after restarting the SOLR, and a similar behavior was observed. This time, QueryResultCache was monitored closely via SOLR console and increase in size of the cache was observed as given below:

  • Test 1 – 190K elements in cache
  • Test 2 – 350K elements in cache
  • Test 3 – 520K elements in cache

A heap dump was taken after these tests, and it was observed that around 8GB of the memory was occupied by FastLRUCache and its contents. While looking at the SOLR configuration, it was observed that the below settings were given for the QueryResultCache:

<queryResultCache class="solr.FastLRUCache"   size="5000"  initialSize="512" maxRamMB="1048" autowarmCount="0"/>

In this case, QueryResultCache was not honoring both ‘size’ and ‘maxRamMB’ parameters. It went beyond the values set. Cache size went to 520K, as opposed to the 5K size set, and it crossed 8GB in size in the heap dump, as opposed to the maximum of 1GB limit set. Instead of using both the parameters to limit the cache, it was decided to restrict the cache using the ‘size’ parameter.

The following settings were applied in SOLR:

<queryResultCache class="solr.FastLRUCache " size="150000" initialSize="512" autowarmCount="0"/>

Three tests were repeated, and it was observed that the CPU usage of SOLR was constant at around 40% in all three tests. Also, the QueryResultCache grew to 150K size, and, after that, evictions were seen in the cache by restricting the size within the defined limit. The old generation memory remained constant, and the time to garbage collect the young generation also came down. Old generation size grew to 3 GB and was consistent across the tests. The average time to minor GC was around 650 milliseconds without any spikes. The test result was successful without any CPU issues in SOLR.

Based on the root cause analysis, it looks like there is a leak in the SOLR cache if both the ‘size’ and ‘maxRamMB’ are set in SOLR for QueryResultCache. Setting just the ‘size’ parameter for the QueryResultCache avoids this memory leak.

Memory (storage engine) garbage collection Testing Cache (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Performance Engineering Management: A Quick Guide
  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Node.js Performance Tuning: Advanced Techniques to Follow

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!