DZone
Performance Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Performance Zone > Optimizing Solr - Boosting Your Search Speed by 7x!

Optimizing Solr - Boosting Your Search Speed by 7x!

Christopher Berner user avatar by
Christopher Berner
·
Apr. 03, 12 · Performance Zone · Interview
Like (0)
Save
Tweet
23.41K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Solr powers enterprise search on sites from Ebay to Zappos. It also powers Carsabi, but when we reached 1.8M listings per month (passing Autotrader & Cars.com) our basic installation began to run about as fast as an octogenarian in congealing cement. I’d like to share the basics of Solr optimization, as well as some data on real world gains.

Very briefly, our stack has gone through a few iterations which may be sufficient for your corpus volume – no sense in over-engineering. Postgres tables had to be denormalized at 100k vehicles, and we switched to WebSolr’s extremely convenient Solr solution at 300k – their Heroku plugin will create an installation in minutes for just $20/month. This worked very well until about 1M listings, at which point even their beefiest plan was returning results with >800ms latency.

 

Hardware: Bigger Is Better. A Lot Better

Our previous Solr-as-a-Service had been hosted on an Amazon EC2 Large instance and returned in 800ms. Fortunately, we had spare capacity on an EC2 Cluster Compute Eight Extra Large, which we use for our webcrawler, and just moving to this machine dropped our query time to 282ms – a speed increase of 2.84x. Notice this corresponds to the processor speed increase of 2.75x between a Large and CC8XL, not the 22x gain in total compute units. Memory appears to be equally irrelevant with both the Large and CC8XL easily keeping our 3GB index in RAM. However, do make sure to give Solr sufficient memory by adjusting the JVM heap size via the -Xmx option.

Software: Shard that Sh*t

282ms is pretty good, but I wanted better - Solr was still responsible for over 50% of our user latency. Google was consulted with surprising results: even if you have just one server, you should still shard your workload. This struck me as odd. Surely Solr is multi-threaded, so why the difference? A quick look at top told me this wasn't the case: Solr's CPU usage never went above 100% (on one core) even though our server had 16 physical cores. A couple hours later, and now with our index spread over 8 shards, our query time was down to 43ms! Also, top showed Solr's CPU usage at 483%, so it was clearly using multiple cores.

I ran some benchmarks, and the following results show solid gains up to 8 shards. If you’re interested in how to shard your own index, I’ve published a brief summary here. The test was run on a representative query from our workload using sorting on two dimensions, a geo bounding box, four numeric range filters, one datetime range filter, and a categorical range filter. I don't know if the fact that it levels off after 8 is due to our specific dataset, or the fact that our server has dual 8-core processors – however, if you have any insight please shoot me a line! *edit* or join in the discussion on HN

Shard (database architecture) Database garbage collection Enterprise search Apache Solr Filter (software) Memory (storage engine) Listing (computer) AWS cluster

Published at DZone with permission of Christopher Berner. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Ultra-Fast Microservices: When Microstream Meets Payara
  • An Overview of Key Components of a Data Pipeline
  • What Is Cloud-Native Architecture?
  • How Database B-Tree Indexing Works

Comments

Performance Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo