ElasticSearch: Advantages, Case Studies, and Stats
ElasticSearch: Advantages, Case Studies, and Stats
ElasticSearch is an open-source, broadly distributable, readily scalable, enterprise-grade search engine. Look more closely into what it is, its advantages, and stats.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Everything changes extremely fast nowadays, and it is very important to follow new trends and understand them. One of these trends is ElasticSearch. We will look at what it is, its main advantages, statistics, and success stories.
What Is ElasticSearch?
ElasticSearch is an open-source, broadly distributable, readily scalable, enterprise-grade search engine based on Lucene and released under the terms of the Apache License. It is Java-based and designed to operate in real-time. It can search and index document files in diverse formats. It was designed to be used in distributed environments by providing flexibility and scalability. Now, ElasticSearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.
ElasticSearch is able to achieve fast search responses because instead of searching the text directly, it searches an index instead. This is more or less like searching for a keyword by scanning the index at the back of a book as opposed to searching every word of every page of the book. ElasticSearch can scale up to thousands of servers and accommodate petabytes of data. Its enormous capacity results directly from its elaborate, distributed architecture.
ElasticSearch is used for a lot of different use cases as well, for example, “classic” full-text search, analytics storage, auto-complete, spell checker, alerting engine, and a general purpose document store.
Advantages of ElasticSearch include the following:
- Lots of search options.ElasticSearch implements a lot of features when it comes to search such as customized splitting text into words, customized stemming, faceted search, full-text search, autocompletion, and instant search. Also, fuzzy search is good for spelling errors. You can find what you are searching for even though you have a spelling mistake. Autocompletion and instant search refer to searching while the user types. It can be simple suggestions of existing tags, trying to predict a search based on search history, or just doing a completely new search for every keyword.
- Document-oriented. ElasticSearch stores real-world complex entities as structured JSON documents and indexes all fields by default, with a higher performance result.
- Speed. Speaking of performance, ElasticSearch is able to execute complex queries extremely fast. It also caches almost all of the structured queries commonly used as a filter for the result set and executes them only once. For every other request containing a cached filter, it checks the result from the cache.
- Scalability. Software development teams favor ElasticSearch because it is a distributed system by nature and can easily scale horizontally, providing the ability to extend resources and balance the loading between the nodes in a cluster.
- Data record. ElasticSearch records any changes made in transactions logs on multiple nodes in the cluster to minimize the chance of data loss.
- Query fine tuning. Elastic search has a powerful JSON-based DSL, which allows development teams to construct complex queries and fine tune them to receive the most precise results from a search. It provides also a way of ranking and grouping results.
- RESTful API. ElasticSearch is API-driven, so actions can be performed using a simple RESTful API.
- Distributed approach. Indices can be divided into shards, with each shard able to have any number of replicas. Routing and rebalancing operations are done automatically when new documents are added.
- Multi-tenancy. Often, you have multiple customers or users with separate collections of documents, and a user should never be able to search documents that do not belong to them. This often leads to a design where every user has their own index. Often, this leads to having too many indexes. One larger ElasticSearch index is actually better.
Success Stories, Facts, and Stats
ElasticSearch has been adopted by some major brands. Let’s look at some of them to see their results.
Dell implemented ElasticSearch to support e-commerce searches for 60+ countries in 21+ languages. The search team at Dell has 30 members. They’ve seen the importance of search advance as consumer shopping expectations became more focused on instant gratification.
Several years ago, the Dell search commerce platform was experiencing aging pains. It was not responsive and did not support multi-tenancy, cloud readiness, etc. As a result, it wasn’t horizontally scalable and there were challenges creating and maintaining indices. It was time to modernize the search platform and meet the needs of modern e-commerce. They evaluated Solr, Google Search Appliance, and other search engines, but ultimately narrowed down on ElasticSearch. Multi-tenancy, ease of scalability, relevancy of results, aggregations queries, and being open source were the key enablers for going with ElasticSearch.
Dell has deployed two ElasticSearch clusters on Windows servers in Dell data centers. The search platform is based on the .NET framework. One is a search cluster that powers the search experience on Dell.com, and the other is an analytics cluster used to track search-related user activity on the site. The analytics cluster provides the ability to deliver crowdsourced and influenced search results, and also provides great insight into the usage of search platform.
The Dell search cluster contains an extremely comprehensive data set, as it indexes everything on Dell.com, consisting of over 27 million documents which include all the products that can be purchased on the site, all the drivers for these products that can be downloaded, troubleshooting articles, knowledge-base documents, product manuals, videos and video metadata, etc. The product documents include all the information related to that particular product: the product title, its description, the image link, keywords, meta information for the technical specifications of these products (RAM size, processor type, resolution, etc.), stock status so they know how many days it will take to ship the product, pricing information, department category, etc.
As for the Dell analytics cluster, which has currently more than one billion documents, indexes every click on Dell.com that comes from a search experience. Dell uses this data to analyze the top-performing queries, the top-performing categories, and various other metrics to perform actionable, dynamic improvements to the site.
Also, in order to deliver accurate search results in all languages, Dell created extensive linguistic pipelines for each language. The pipelines utilize ElasticSearch’s language analyzers, stopword removal, spell check, synonym match, stemming, and other features to make the query more accurate. Dell also added a final step at the end of their linguistic pipelines that they call a catch-all influencer, which is essentially an offline aggregator that helps identify the entities from the query the customer entered. This aggregator runs across multiple systems, such as the content management system and their master lookup tables across various databases, and, depending on what the customer queried for in the search bar, maps the product category to the product category code, the manufacturer name to the manufacturer code, etc. These inputs, enriched with analytics and customer identification data, are then passed to a probability engine and help Dell re-write the final query. This context helps Dell significantly understand what the user is expecting when they perform a search.
Thanks to the real-time nature of ElasticSearch, as well as its powerful aggregations, Dell introduced a virtual assistant, which gives shoppers an interactive way to refine their search before clicking the search button by giving them a preview of their results. If I type the term “laptop,” I can see that there are refiners available to narrow down my search, one of them being screen size, another being the processor type, and so on.
As ElasticSearch supports the creation of multiple indices, it provided a great ability for Dell to deliver more features. For example, Dell was able to create an experimentation engine on their existing framework, which lets them easily test new features to a specific percentage of users and measure the impact before rolling them out to their entire deployment. This gives Dell a solid working hypothesis of the user’s rate of relevant results.
As a result of the switch to Elastic search, Dell has seen increases in revenue per visit, click-through rate, average order value, conversion, and positive customer satisfaction score. Also, now by using ElasticSearch, Dell ensures the right people have the right access and permissions to their cluster in a live, customer-facing environment. By migrating to ElasticSearch, Dell reduced the number of servers they needed by 25-30%.
The Guardian wanted to revitalize the newspaper industry with real-time readership data. They faced the challenge of ensuring that web content was properly presented and exposed to five million readers.
The Guardian’s in-house developed analytics system enables users across the company to see in real-time exactly how users are interacting with the content. In the news environment, which changes every minute, real-time visibility is invaluable. The Guardian leverages the data to ensure that content is given exposure at the right time, on the proper social media platforms, with the right headlines. ElasticSearch gave The Guardian the freedom to build a very powerful analytics system in-house, processing 40 million documents per day to deliver real-time visibility of site traffic across the organization. Now, a large portion of The Guardian’s business relies on ElasticSearch to understand how their content is being consumed.
The use cases for ElasticSearch at The Guardian are varied; the visibility afforded by the analytics system is used to see how many hits each content item receives, which headlines and content generate more traffic, from where traffic is being referred, which social media platforms to promote specific content on and when to gain maximum exposure, and which links to provide the reader to click on next. Engineers are even using ElasticSearch to diagnose website performance issues by searching through events.
For The Guardian, responding to change in real-time is critical. A significant portion of our site will get a lot of traffic in a very short time. In that type of circumstance, they need to be able to respond at its peak, so they need to have the information right away. If we wait until the end of the day to see what’s happening, it would be too late. And ElasticSearch provides this real-time visibility.
ElasticSearch helps to leverage real-time analytics — for example, easily query 360 million documents, see traffic for all content as it happens, and gain insight into how updates impact site traffic. Also, it gives the entire organization real-time insight into audience engagement, democratizes analytics access for more than 500 users, and encourages a culture of exploration and innovation for all employees. By using ElasticSearch, The Guardian drives more page views because it helps to improve content, headlines, and promotion in a variety of ways. And as a result, it increases the number of page views and the site’s success.
What is also very important to mention is that it empowers the team to get more involved and take a proactive approach to improving the site and its content. It enhances user experience as well, by providing readers with more content that meets their demands, which enhances UX. And of course, it improves site performance by tracking how any changes impact site performance, diagnosing issues, and keeping the site up and running at peak performance.
Docker had a challenge of how to deliver high-performance searching across a continuously growing database without overloading operational resources. The IT department decided to use ElasticSearch to easily find the right container for running distributed applications. Now, ElasticSearch really helps docker deliver a scalable, seamless, and highly available search and discovery experience to the growing Docker community.
Having made the decision to move to ElasticSearch, Docker evaluated the available options for hosting ElasticSearch by looking at a variety of different criteria: location, the number of indexes, available resources, high availability options, and price. ElasticCloud was the best option.
Consistent performance and reliability were key concerns for Docker, making ElasticCloud’s dedicated ElasticSearch clusters a good fit for two key reasons. First, ElasticCloud’s hosting model, based on dedicated clusters with reserved memory and CPU, gave them assurance that their application would be consistently performant. Second, ElasticCloud’s high availability options gave Docker added assurances that even in the event of a full data center outage, their search database would remain available.
Moving to ElasticSearch in production affected the performance gains Docker was looking for. Load dropped, and search latency and throughput massively improved. Additionally, Docker was able to greatly improve search result quality by using ElasticSearch’s field boosting and function score queries to promote more popular and relevant search results.
With their new infrastructure, Docker is able to serve better search results faster. For Docker, this is critical; a tool built around providing power and convenience must also have supporting services that possess those characteristics. With ElasticSearch, Docker found a solution on how to easily and cost-effectively scale a search application to meet growing volumes of data, ensure excellent search, and discovery experience and manage operational complexities.
Published at DZone with permission of Ekaterina Novoseltseva . See the original article here.
Opinions expressed by DZone contributors are their own.