Solr vs. ElasticSearch
Solr vs. ElasticSearch
A review of the differences between Apache Solr and ElasticSearch, beyond just functionality.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Solr and ElasticSearch are competing search servers and are built on top of Lucene, so many of their core features are identical. Choosing the right seach engine depends on a range of application-specific factors.
Apache Solr has a mature community with a large and active development and user community behind it. Elasticsearch is more of young community and built upon modern principles, aimed at more modern use cases. That's not to say Elasticsearch is far behind. Although quite young, its community is vastly expanding.
Solr is well-documented with the necessary context and examples on how different APIs and components are used, while documentation for Elasticsearch lacks good working examples and configuration instructions, but is slightly better organized.
In Solr you need the schema.xml file in order to define your index structure, fields, and their types. Of course, you can have all fields defined as dynamic fields and create them on the fly, but you still need at least some degree of index configuration. In most cases though, you’ll create a schema.xml file to match your data structure. After each change, you need to restart a Solr node or reload it. ElasticSearch is schemaless, all configs in ElasticSearch are written to an elasticsearch.yml file, which is just another configuration file. However, that’s not the only way to store and change ElasticSearch settings.
In order to get search results from Solr, you need to query one of the defined request handlers and pass in the parameters that define your query criteria. Depending on which query parser you choose to use, these parameters will be different, but the method is still the same — an HTTP GET request is sent to Solr in order to fetch search results. ElasticSearch exposes a REST API which can be accessed using HTTP GET, DELETE, POST, and PUT methods. Its API allows one to not only query or delete documents, but also create indices, manage them, analyze them, and get all the metrics describing the current state and configuration of ElasticSearch. The only format ElasticSearch can respond in is JSON — there is no XML response, for example. With Solr, all query parameters are passed in as URL parameters, in ElasticSearch queries are structured in JSON representation. Queries structured as JSON objects give one a lot of control over how ElasticSearch should understand the query and thus what results to return.
Distributed, Cloud-ready Search
Elasticsearch, unlike Solr was built with distribution in mind. To be EC2-friendly means that Elasticsearch runs a search index on multiple servers, in a fail-safe and efficient way, and that’s quite a challenge.
Elasticsearch allows you to break indices into shards with one or more replicas. The shards are hosted in a data node within the cluster that delegates operations to the correct shards with rebalancing and routing done automatically. This ensures that even, in case of some catastrophic hardware or software failure, the chances of your search server going completely offline are close to none. ElasticSearch has been designed with the cloud era in mind.
Even though some steps to make Solr cloud-ready have been taken, its initial architecture and design do not include it, so it will take more time to get Solr where Elasticsearch is out-of-the-box.
Elasticsearch is real-time and distributed, just specify your delay time via the API. Its design follows percolation, an innovative search model similar to webhooks. The idea behind it is that Elasticsearch will notify your application each time new document matches your filters instead of constantly polling the search engine to check for new updates. Elasticsearch has a default refresh interval set to one second, so within only a second of indexing a document, it becomes searchable.
This is the perfect architecture for real-time search.
Solr is your search server for creating standard search applications, no massive indexing and no real time updates are required. Elasticsearch architecture is on a whole new level aimed at building modern real-time search applications. If you want distributed indexing then you need to choose Elasticsearch. Elasticsearch is the only true option for cloud and distributed environments. Elasticsearch is scalable, lightning fast, and a breeze to integrate with. Its API is more intuitive and accessible than Solr’s.
Opinions expressed by DZone contributors are their own.