DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Coding
  3. Languages
  4. Solr Upgrade Surprise and Using Kill to Debug It

Solr Upgrade Surprise and Using Kill to Debug It

Geoffrey Papilion user avatar by
Geoffrey Papilion
·
Aug. 01, 12 · Interview
Like (0)
Save
Tweet
Share
3.62K Views

Join the DZone community and get the full member experience.

Join For Free

at work, we’ve recently upgraded to the latest and greatest stable version of solr (3.6), and moved from using the dismax parser to the edismax parser. the initial performance of solr was very poor in our environment, and we removed the initial set of search features we had planned to deploy trying to get the cpu utilization in order.

once we finally, rolled back a set of features solr seemed to be behaving optimally. below is what we were seeing as we looked at our search servers cpu:
solr cpu usage pre and post fix
throughout the day we had periods where we saw large cpu spikes, but they didn’t really seem to affect throughput or average latency of the server. none the less we suspected there was still an issue, and started looking for a root cause.

kill -3 to the rescue


if you’ve never used kill -3, its perhaps one of the most useful java debugging utilities around. it tells the jvm to produce a full thread dump, which it will then print to the stdout of the process. i became familiar with this when trying to hunt down treads in a tomcat container that were blocking the process from exiting. issuing kill -3 would give you enough information to find the problematic thread, and work with development to fix it.

in this case, i was hunting for a hint as to what went wrong with our search. i issued kill -3 during a spike, and got something like this:

012-07-27_16:52:01.54871 2012-07-27 16:52:01
2012-07-27_16:52:01.54873 full thread dump java hotspot(tm) 64-bit server vm (20.5-b03 mixed mode):
2012-07-27_16:52:01.54874
2012-07-27_16:52:01.54874 "jmx server connection timeout 1663" daemon prio=10 tid=0x0000000040dee800 nid=0x192c in object.wait() [0x00007f1a24327000]
2012-07-27_16:52:01.54874 java.lang.thread.state: timed_waiting (on object monitor)
2012-07-27_16:52:01.54999 at java.lang.object.wait(native method)
2012-07-27_16:52:01.55000 - waiting on <0x00007f7c189ff118> (a [i)
2012-07-27_16:52:01.55001 at com.sun.jmx.remote.internal.servercommunicatoradmin$timeout.run(servercommunicatoradmin.java:150)
2012-07-27_16:52:01.55001 - locked <0x00007f7c189ff118> (a [i)
2012-07-27_16:52:01.55002 at java.lang.thread.run(thread.java:662)
2012-07-27_16:52:01.55002
...
2012-07-27_16:52:01.55458 "1565623588@qtp-1939768105-762" prio=10 tid=0x00007f7314537800 nid=0x120c runnable [0x00007f1a24c2f000]
2012-07-27_16:52:01.55459 java.lang.thread.state: runnable
2012-07-27_16:52:01.55459 at org.apache.lucene.util.priorityqueue.downheap(priorityqueue.java:239)
2012-07-27_16:52:01.55459 at org.apache.lucene.util.priorityqueue.pop(priorityqueue.java:176)
2012-07-27_16:52:01.55459 at org.apache.lucene.index.directoryreader$multitermenum.next(directoryreader.java:1129)
2012-07-27_16:52:01.55460 at org.apache.lucene.search.filteredtermenum.next(filteredtermenum.java:77)
2012-07-27_16:52:01.55460 at org.apache.lucene.search.filteredtermenum.setenum(filteredtermenum.java:56)
2012-07-27_16:52:01.55461 at org.apache.lucene.search.fuzzytermenum.<init>(fuzzytermenum.java:121)
2012-07-27_16:52:01.55461 at org.apache.lucene.search.fuzzyquery.getenum(fuzzyquery.java:135)
2012-07-27_16:52:01.55462 at org.apache.lucene.search.multitermquery$rewritemethod.gettermsenum(multitermquery.java:74)
2012-07-27_16:52:01.55462 at org.apache.lucene.search.termcollectingrewrite.collectterms(termcollectingrewrite.java:34)
2012-07-27_16:52:01.55463 at org.apache.lucene.search.toptermsrewrite.rewrite(toptermsrewrite.java:58)
2012-07-27_16:52:01.55463 at org.apache.lucene.search.multitermquery.rewrite(multitermquery.java:312)
2012-07-27_16:52:01.55463 at org.apache.lucene.search.vectorhighlight.fieldquery.flatten(fieldquery.java:114)
2012-07-27_16:52:01.55464 at org.apache.lucene.search.vectorhighlight.fieldquery.flatten(fieldquery.java:104)
2012-07-27_16:52:01.55464 at org.apache.lucene.search.vectorhighlight.fieldquery.flatten(fieldquery.java:98)
2012-07-27_16:52:01.55465 at org.apache.lucene.search.vectorhighlight.fieldquery.flatten(fieldquery.java:98)
2012-07-27_16:52:01.55465 at org.apache.lucene.search.vectorhighlight.fieldquery.flatten(fieldquery.java:98)
2012-07-27_16:52:01.55466 at org.apache.lucene.search.vectorhighlight.fieldquery.<init>(fieldquery.java:69)
2012-07-27_16:52:01.55466 at org.apache.lucene.search.vectorhighlight.fastvectorhighlighter.getfieldquery(fastvectorhighlighter.java:97)
2012-07-27_16:52:01.55466 at org.apache.solr.highlight.defaultsolrhighlighter.dohighlighting(defaultsolrhighlighter.java:388)
2012-07-27_16:52:01.55467 at org.apache.solr.handler.component.highlightcomponent.process(highlightcomponent.java:131)
2012-07-27_16:52:01.55467 at org.apache.solr.handler.component.searchhandler.handlerequestbody(searchhandler.java:186)
2012-07-27_16:52:01.55468 at org.apache.solr.handler.requesthandlerbase.handlerequest(requesthandlerbase.java:129)
2012-07-27_16:52:01.55468 at org.apache.solr.core.solrcore.execute(solrcore.java:1376)
....

looking at the the output, i noticed that we had a lot threads calling fuzzytermenum. i thought this was strange, and sounded like an expensive search method. i talked with the developer, and we expected that the tilde character was being ignored by edismax. at the very least being escaped by our library, since it was included in the characters to escape. i checked the request logs, and we had people looking for exact titles that contained ~. this turned a 300ms query into a query that timed out, due to the size of our index. further inspection of the thread dump revealed that we were also allowing the * to be used in query terms as well. terms like *s ended up being equally problematic.

a solr surprize


we hadn’t sufficiently tested edismax, and we’re surprised that it ran ~,+,^, and * when escaped. i didn’t find any documentation that stated this directly, but i didn’t really expect to. we double checked our solr library to see if that it was properly escaping the special characters in the query, but they we’re still being processed by solr. on a hunch we tried double escaping the characters, which resolved the issue.

i’m not sure if this is a well known problem with edismax, but if you’re seeing odd cpu spikes this is definitely worth checking for. in addition, when trying to get to a root of a tough problem kill -3 can be a great shortcut. it saved me a bunch of painful debugging, and really eliminated almost all my guess work.

Debug (command) Database Spike (software development) Parser (programming language) Dump (program) Library Documentation Apache Tomcat Requests

Published at DZone with permission of Geoffrey Papilion, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Is Policy-as-Code? An Introduction to Open Policy Agent
  • Key Considerations When Implementing Virtual Kubernetes Clusters
  • The Importance of Delegation in Management Teams
  • Enabling DB Migrations Using Kubernetes Init

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: