Set Up Solr and Get it Running
Join the DZone community and get the full member experience.
Join For FreeSolr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
The main feature of Solr (or at least the most useful) is its REST-like Api, since instead of playing with Api and drivers to talk to Solr you can easily make HTTP requests and get results in JSON or XML (and it's pronounced solar, if you wonder.)
I would not say that this is a perfect REST interface which incarnates all the principles of HTTP 1.1, but the point is that data has a simple representation which goes back and forth from the client to the server, and no encapsulation in SOAP's nightmarish envelopes. Plus, it's human-readable since XML and JSON can be written easily for exploratory testing purposes.
An year ago, when I heard that CouchDB had a REST api, I said what an unuseful level of abstraction, why can't I just use a PHP extension/driver like with MySQL, sqlite and similar databases?
Now I see the potential of such an universal interface:
- it is language agnostic due to XML or JSON usage, which can be interpreted by nearly everything nowadays. The metric usually is JavaScript: if you can understand it in JavaScript in a browser with all its limitations, you can do it everywhere. Of course, JavaScript supports natively both JSON (by eval(), even if it's not secure) and XML (with DOM).
- it is data type agnostic due to HTTP, which can only transmit strings. There is no type safety, but great interoperability. Dynamic languages like PHP succeed so much also because the basic protocol has no strict types. If the front end is only going to print it, why shouldn't a string be enough?
- it is more or less a standard protocol (although the data representation is not): if anyone that invents a database publishes his own binary communication protocol, we'll have lots more libraries.
Solr is written in Java, but you can access it with your language of choice if you need, simply by doing GET requests to search the index and POST ones to add documents.
Of course there is usually a library for every known programming language, that wraps the REST-like interface, but it is dead simple to build such a library for example with JavaScript, and it is not mandatory to use a library at all. And this lead us to another point: if Solr had a binary interface like MySQL's one how would you wrap that with Javascript? HTTP is universal since nearly everything now from computers to ovens to hair dryers can make HTTP requests.
Starting out
Simplicity of usage, starting from the protocol, is a key feature in comparison with Lucene, and it is indeed very easy to setup Solr and get it running.
Since Solr's primary interface is web-based, it needs a servlet container (reinventing the wheel and implementing a full HTTP stack was not a great idea). Solr has several servlets that handle different end points, like /update or /select.
However, it comes with a prepackaged example with Jetty (a small, lightweight servlet container) so that it runs out of the box, but you can deploy it also as a Tomcat web app if you want.
All you need to run Solr is uncompressing the release, going in the example/ folder and run 'java -jar start.jar'. Then you can post sample documents and make some queries for exploratory testing.
Of course the bundled schema is an example one, so you can quickly modify schema.xml to craft a custom one. Just add some <field> tags:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="PageTitle" type="string" indexed="true" stored="true"/>
<field name="Artist" type="string" indexed="true" stored="true"/>
<field name="Title" type="string" indexed="true" stored="true"/>
Test automation with a Solr index
Since Solr is so simple to use, let's do something scary with it: test automation.
When I started integrating Solr in my multimedia search application project, I wrote an integration test with JUnit plus a SolrWrapper class in less than two pomodoros, a time that mostly went to learn the Api of SolrJ, the Java library that wraps the HTTP-based interface. I could also have done it simply with URL and HttpURLConnection native objects if I hadn't a library available.
package it.polimi.chansonnier.test;
import java.util.ArrayList;
import java.util.Collection;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;
import junit.framework.TestCase;
public class SolrIntegrationTest extends TestCase {
private SolrWrapper solrWrapper;
public void testSolrInstanceCanBeStartedQueriedAndStopped() throws Exception {
solrWrapper = new SolrWrapper();
solrWrapper.start();
String url = "http://localhost:8983/solr";
CommonsHttpSolrServer server = new CommonsHttpSolrServer( url );
server.setParser(new XMLResponseParser());
server.deleteByQuery( "*:*" );// delete everything!
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "id", "id1", 1.0f );
doc1.addField( "name", "doc1", 1.0f );
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
docs.add( doc1 );
server.add( docs );
server.commit();
SolrQuery query = new SolrQuery();
query.setQuery( "*:*" );
//query.addSortField( "name", SolrQuery.ORDER.asc );
QueryResponse rsp = server.query( query );
SolrDocumentList docList = rsp.getResults();
assertEquals("[id1, doc1]", docList.get(0).values().toString());
}
public void tearDown() {
solrWrapper.stop();
}
}
package it.polimi.chansonnier.test;
import java.io.File;
public class SolrWrapper {
private Process solr;
public void start() throws Exception {
Runtime r = Runtime.getRuntime();
solr = r.exec("/usr/bin/java -jar start.jar", null, getSolrRoot());
Thread.sleep(5000);
}
public void stop() {
if (solr != null) {
solr.destroy();
}
}
private File getSolrRoot() throws Exception {
String root = System.getProperty("it.polimi.chansonnier.solr.root");
if (root == null) {
throw new Exception("Solr path is not specified, please add the property it.polimi.chansonnier.solr.root");
}
return new File(root);
}
}
Remember that an integration test by my definition (and Growing object-oriented software's one) involves only the behavior of an external entity, to make sure we understand how it works and that the contract we are programming to in our code is correct.
Now my acceptance tests start Solr and stop it at the end of the test, resetting its index at the start of every test method:
package it.polimi.chansonnier.test;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import junit.framework.TestCase;
public abstract class AcceptanceTest extends TestCase {
private SolrWrapper solrWrapper;
protected CommonsHttpSolrServer solrServer;
public void setUp() {
solrWrapper = new SolrWrapper();
try {
solrWrapper.start();
String url = "http://localhost:8983/solr";
solrServer = new CommonsHttpSolrServer( url );
solrServer.setParser(new XMLResponseParser());
solrServer.deleteByQuery( "*:*" );
} catch (Exception e) {
e.printStackTrace();
}
}
public void tearDown() {
solrWrapper.stop();
}
}
Now what?
Now that I have a running Solr instance for my application, I plan to use AJAX Solr, a JavaScript library that makes requests directly to Solr, to build a rich front end. AJAX Solr is the proof of how browsers have become powerful today: you can actually see it doing queries via Firebug (imagine doing that with relational databases as back ends).
I've always been scared of rich-client applications because of the difficulty of testing, but now tools have grown to support that. Of course I'll test-driven my web interface it with HttpUnit as I did with my plain HTML one; it uses Rhino to crawl JavaScript-powered pages and assert they're generated correctly.
Opinions expressed by DZone contributors are their own.
Comments