How to Convert Experts into Search Appliances
Join the DZone community and get the full member experience.Join For Free
Responding to an RFP typically requires a sales operations specialist to track down information to answer industry-related questions, often requiring a high level of detail, and may be technical in nature, therefore requiring a subject-matter expert. But what if the expert is unavailable?
Once a year at DataXu the entire company is given the opportunity to innovate in ways unrelated to their day-to-day work. Before last year’s Innovation Day I was approached to solve this ubiquitous issue by creating a search engine for RFPs, and in little time I had created an easy-to-use and incredibly powerful RFP search engine titled RFP Gold.
Innovation Day gave me the opportunity to use Solr in a Rails application built from the ground up, a prospect for which I had already been searching. The search engine ingests files containing question/answer pairs gleaned from previous RFPs, each of which must be manually categorized. The categories, questions, and answers are all indexed using an ultra-fast Lucene-based search server called Solr (an Apache product). Each question and answer is weighted by its rating, judged by consumers of the search engine, with which I also built some easy-to-use metrics, such as highest, lowest, and most rated questions.
Solr acts as an interface to Apache Lucene, and the Sunspot gem (for Ruby) acts as an uncomplicated wrapper for Solr, allowing any Rails developer to very quickly assign indices to object attributes (and nested attributes) as well as search weightings and other search properties, such as each field type (number, text, etc.), or how results should be sorted by default. The Solr configuration allows you to adapt the search to the needs of your business. A search term is run through three distinct categories of Solr components: analyzers, “tokenizers”, and filters. Analyzers can be run both at index-time to create token streams, as well as on each query string (to create equivalent token streams). Tokenizers are mechanisms for manipulating search terms to better suit the needs of your business. Filter rules can remove or keep tokens.
A useful example of an analyzer is the synonym rule, which can be one-to-many or one-to-one. For example, in our industry, “display” has a one-to-one mapping with “online”, whereas “social” could have a one-to-many mapping with “Facebook” and “Twitter”. In practice analyzers are composed of both tokenizers and filters, and depending on the context you may apply different analysis to the data being consumed than to the query. In my example I applied the synonyms to the index as opposed to the query, such that when a query is executed there already exists an index with that mapping, which improves performance but increases memory and space overhead.
Once a query has been tokenized the tokens are fed through filters. The configuration can filter from a list of “stop words”, or those that should be removed from any search, and conversely use a list of “keep words”, or those that should never be filtered. It can also ingest a list of protected words, or those that should not should be reduced to their respective base words.
Having results quickly is great but those must be of a certain standard of quality. In this case using the question rating as part of the search yields higher-rated questions higher in the search results. I incidentally piggybacked on the in-memory database (Memcached) that the rating system requires to provide recent searches and a nice tag-cloud.
In order to use the search effectively consumers need to be able to search by category (or multiple categories) and by RFP/RFI (or any combination thereof), which I accomplished through the union and intersection of different result-sets. They also need to be able to paginate the results, which is accomplished in a single line during Solr search.
When the dust settled over Innovation Day my team had come in 3rd place, and shortly thereafter I opened the prototype for use by the organization. Now there are over 700 answers belonging to over 100 categories, enabling thousands of searches. This has saved hundreds of person-hours over the past year both for the sales operations specialists and for subject-matter experts throughout the company.
Published at DZone with permission of Josh Begleiter. See the original article here.
Opinions expressed by DZone contributors are their own.