Platinum Partner
java,apache,apache solr,apache tika,apache lucene

Inside the New Apache Solr

In 2006, Solr was donated to the Apache Foundation and integrated into the Lucene project.  Apache Solr is an enterprise search platform that powers the search and navigation features of many of the world's largest internet sites.  It harnesses the popular Apache Lucene Java search library, which has over 3,000 installations.  With the recent release of Solr 1.4, DZone conducted an exclusive interview with Grant Ingersoll, a committer on the Apache Lucene and Apache Solr projects, as well as the current Lucene PMC chair.

"Solr is Lucene best practices plus a whole bunch of production-ready capabilities," said Ingersoll.  "Solr takes Lucene and packages it up as an HTTP server."  Ingersoll says that Solr runs as a web application inside a servlet container such as Tomcat or Jetty, providing the functionality of Lucene as well as other search capabilities such as faceting.  The Lucene functionalities handle distributed search capabilities along with replication for failover and load balancing.  Solr also provides easy configuration through XML.  

DZone asked Ingersoll where Solr is being used.  He listed some major websites such as AP interactive, Netflix.com, Comcast.com, and Zappos.com, but admitted that there were too many to name.  CNET also uses Solr because they were the ones who originally donated the software to Apache.  Webshots, product reviews, and other database documents on CNET are indexed in Solr and scored based on importance parameters.  Ingersoll says the great thing about Solr is that it works almost anywhere.  "You can talk to Solr via any client that supports HTTP," he said.  

The 1.4 version of Solr is the project's latest release Ingersoll says.  It includes many bug fixes and significant performance improvements for indexing, searching, and faceting.  New rich document (Word, PDF, HTML) processing via Apache Tika can extract content, index it, and make it searchable.  There's better support for numeric range queries that can now find date ranges faster.  Ingersoll says Solr 1.4 also features new dynamic faceting capabilities that cluster search results on cluster points.

What makes Solr unique, Ingersoll says, is its open source flexibility.  With a powerful external configuration, Solr can be tailored to almost any type of application without Java coding, and it has a flexible plugin architecture for even more customization.  In contrast to Solr's open API, Ingersoll said commercial APIs "are usually black boxes and you don't have access to the lower level details."  With commercial products, you have to pay for a certain number of documents to be indexed or queries per second.  Solr, like all Apache projects, is free.  "You can also scale Solr to be as big as you want," adds Ingersoll.  

In the next version, Solr 1.5, Ingersoll says we can expect more distributed capabilities and more support for geographic searches.  The beauty of open source, Ingersoll adds, is that "good ideas come out of the blue all the time."
{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}