Over a million developers have joined DZone.

Introducing Splainer: The Open Source Search Sandbox That Tells You Why

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

One piece of feedback that has consistently come with our Quepid search testing tool is the need to understand “why” search results come back the order they do. In plain English, what factors influence search the most? Why does my search engine think a document about “water bottles” is more relevant than “baby bottles” for a search about “milk bottles”?

Indeed this is the entire art and science of search relevancy. It's not magic gnomes inside a box that understand all about baby bottles. No, it's heavily tuned heuristics that Solr and Elasticsearch use out of the box (in the form of Lucene’s scoring systems) based on decades of information retrieval research that rests on the foundation of dumb string matching.

How do we tune this insanity? Well luckily you can retrieve the explain information — the debug information that Lucene gives you telling you exactly how each document was scored the way it was. Armed with that information, you can alter how you query your search engine (reweight this field, boost on that field, etc). Unfortunately, the explain is full of unhelpful search nerd trivia (do you know what coord means?) not to mention deeply nested and often redundant information. Good luck if you want to parse this with your eyes. Just give up… your done. Luckily, there’s parsers out there. Copy and paste your explain info, and get something a little more sane to deal with.

But even with nice parsers, we continue to face two problems:

  1. Collaboration: At OpenSource Connections, we believe that collaboration with non-techies is the secret ingredient of search relevancy. We need to arm business analysts and content experts with a human readable version of the explain information so they can inform the search tuning process.

  2. Usability: I want to paste a Solr URL, full of query paramaters and all, and go! Then, once I see more helpful explain information, I want to tweak (and tweak and tweak) until I get the search results I want. Much like some of my favorite regex tools. Get out of the way and let me tune!

I’m proud to announce that we’ve taken our first big steps in these directions with Splainer. Splainer open-sources the core sandbox behind Quepid including new features that tell you why a search result is appearing where it is. At your fingertips in Splainer are three levels of explanation data:

  1. Hot Matches immediately tell you which matches most influence your search results. Often many matches occurr when searching, but due to the machinations of how these matches factor into other search operations, it can be hard to determine which matches matter. We figure that out for you!

  2. Summarized Explain summarizes the relevancy calculation in more human readable terms. This takes things one level deeper. If this were math homework, “hot matches” would be the answer and the “summarized explain” would be showing your work. If you want to know exactly what’s going on, look at the summarized explain.

  3. Finally, the ugly stuff is still there if you want it — the raw explain pulled straight from Lucene. (yuk) for when you really just absolutely need to see your eyes bleed.

But wait there’s more! Modify how you query Solr in the application. A key part of our sandbox is to be able to work with your Solr instance directly from the browser. In fact, the entire application is driven 100% through HTML and Javascript (no backend involved). If your browser can see it, then Splainer can see it! Simply paste your Solr URL from your browser and into Splainer — and it will work! Start tuning to your hearts content!

As I said the entire project contributes components of Quepid to the open source community under an Apache license! And this is just the beginning. We’ve already got Elasticsearch support in the works. And we want to keep working on making the explain information even less search geeky. As a sandbox, Splainer has access to the queries being executed. We should be able to tie the two together to definitively say “this happened because you boosted on this field!”.

We hope you’ll give it a spin and let us know how it can be improved. We welcome your bugs, feedback, and pull requests. And if you want to try the Splainer experience over multiple queries, with diffing, results grading, a develoment history, and more.

Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison


Published at DZone with permission of Doug Turnbull, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}