One piece of feedback that has consistently come with our Quepid search testing tool is the need to understand “why” search results come back the order they do. In plain English, what factors influence search the most? Why does my search engine think a document about “water bottles” is more relevant than “baby bottles” for a search about “milk bottles”?
Indeed this is the entire art and science of search relevancy. It's not magic gnomes inside a box that understand all about baby bottles. No, it's heavily tuned heuristics that Solr and Elasticsearch use out of the box (in the form of Lucene’s scoring systems) based on decades of information retrieval research that rests on the foundation of dumb string matching.
How do we tune this insanity? Well luckily you can retrieve the explain information — the debug information that Lucene gives you telling you exactly how each document was scored the way it was. Armed with that information, you can alter how you query your search engine (reweight this field, boost on that field, etc). Unfortunately, the explain is full of unhelpful search nerd trivia (do you know what coord means?) not to mention deeply nested and often redundant information. Good luck if you want to parse this with your eyes. Just give up… your done. Luckily, there’s parsers out there. Copy and paste your explain info, and get something a little more sane to deal with.
But even with nice parsers, we continue to face two problems:
Collaboration: At OpenSource Connections, we believe that collaboration with non-techies is the secret ingredient of search relevancy. We need to arm business analysts and content experts with a human readable version of the explain information so they can inform the search tuning process.
Usability: I want to paste a Solr URL, full of query paramaters and all, and go! Then, once I see more helpful explain information, I want to tweak (and tweak and tweak) until I get the search results I want. Much like some of my favorite regex tools. Get out of the way and let me tune!
I’m proud to announce that we’ve taken our first big steps in these directions with Splainer. Splainer open-sources the core sandbox behind Quepid including new features that tell you why a search result is appearing where it is. At your fingertips in Splainer are three levels of explanation data:
Hot Matches immediately tell you which matches most influence your search results. Often many matches occurr when searching, but due to the machinations of how these matches factor into other search operations, it can be hard to determine which matches matter. We figure that out for you!
Summarized Explain summarizes the relevancy calculation in more human readable terms. This takes things one level deeper. If this were math homework, “hot matches” would be the answer and the “summarized explain” would be showing your work. If you want to know exactly what’s going on, look at the summarized explain.
Finally, the ugly stuff is still there if you want it — the raw explain pulled straight from Lucene. (yuk) for when you really just absolutely need to see your eyes bleed.
As I said the entire project contributes components of Quepid to the open source community under an Apache license! And this is just the beginning. We’ve already got Elasticsearch support in the works. And we want to keep working on making the explain information even less search geeky. As a sandbox, Splainer has access to the queries being executed. We should be able to tie the two together to definitively say “this happened because you boosted on this field!”.We hope you’ll give it a spin and let us know how it can be improved. We welcome your bugs, feedback, and pull requests. And if you want to try the Splainer experience over multiple queries, with diffing, results grading, a develoment history, and more.