Building Suggest-As-You-Type With Carrot2 Clustering
Building Suggest-As-You-Type With Carrot2 Clustering
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
The first interaction that a customer has with your e-commerce web site is with the search box itself. So it is of utmost importance to make the user experience here is as clean and positive as possible. And one good way of doing this is by providing useful suggestions to the user as they type. This behavior is called Suggest-As-You-Type.
I recently discover a clever new approach to Suggest-As-You-Type which makes use of Carrot2 Clustering. The basic idea is to populate a list of suggestions in an independent Suggest-As-You-Type Solr core.
Let’s walk through the basic idea using an example. Pretend that our online store – a grocery store – has several departments each with their own sub departments. The goal for our Suggest-As-You-Type is to provide reasonable recommendations about which department the customer should visit based upon their current search string. However, a common problem in e-commerce shops is the fact that products may be sourced from several venders, and the information that the venders provide about these products may be incomplete, inconsistent, or just wrong. So how can the Suggest-As-You-Type know that a search for “brocoli” belongs in the “Produce/Vegetable” department? And how can Suggest-As-You-Type know that a search for “India Pale Ale” belongs in the “Adult Beverage/Beer” department? As I found out, Carrot2 Clustering is how!
Here’s roughly how it could work. Presumably, we already have one Solr that is serving up product searches. Let’s restart that Solr and enable the clustering component:
java -Dsolr.clustering.enabled=true-jar start.jar
Now we can make queries with the
clustering request handler that will look something like this:
And the response will in turn look like this:
<arrname="clusters"><lst><arrname="labels"><str>Delicious</str></arr><doublename="score">3.1654221261111397</double><arrname="docs"><str>474523</str><str>234263</str><!-- snip --><str>553285</str></arr></lst><lst><arrname="labels"><str>On Sale</str></arr><!-- snip --></lst><lst><arrname="labels"><str>Frozen</str></arr><!-- snip --></lst><lst><arrname="labels"><str>Milk</str></arr><!-- snip --></lst><!-- snip --></arr>
As you can see, the clustering query is returning a set of tagged clusters, but the tags themselves, Milk, Frozen, On Sale, Delicious, etc. are quite scattered and are not very helpful. Let’s tighten the query up a bit by only looking in the Produce/Fruit department:
The cluster tags here will be much more fruit oriented: Apple, Orange, Pear,Red Delicious, Juicy, Citrus, etc. The tags we see here are perfect for building our Suggest-As-You-Type Solr core because when the user types any of these terms, they will likely be thinking about fruit.
So let’s do just that; let’s build our suggest core. First we need to define the appropriate fields (in schema.xml):
<fieldname="DisplayText"type="string"stored="true"></field><fieldname="TagWords"type="ignored"multiValued="true"></field><fieldname="Text"type="text_general_edge_ngrammed"indexed="true"stored="true"multiValued="true"></field><!-- snip --><copyfieldsource="DisplayText"dest="Text"></copyfield><copyfieldsource="TagWords"dest="Text"></copyfield>
Here we store the DisplayText so that it can be displayed later. But the TagWords can be ignored because that field is only used to refer to the field that we dump into the Text field. The Text field, then, is of type text_general_edge_ngrammed. (So, the same as text_general, but then edge n-gram for faster partial-word matches. We can get a lot more clever here, but text analysis is not the focus of this post.)
Now that the schema is set up, all we need are documents! We need one document for every possible Department/SubDepartment present in our grocery store. The department name goes into the DisplayText field, and, as you might have guessed, for the TagWords field we run a side job that collects the Carrot2 cluster names for each of these departments. (That’s the key part. Read it again.)
Once the indexing of the Suggest At You Type core is complete, we can then take the partial searches of our customers and help direct them to the department that they are seeking. If they are looking for “crispy …” we will direct them to Produce/Fruit, and Produce/Vegetable, and Snacks/Chips. But we do not direct them them to Home Supplies/Cleaners! If they are looking for “microwa…” then we direct them to Frozen Foods, but not Adult Beverages.
Now this does beg a question, and this is the question that I asked my e-commerce client: Just because we can understand which department the searcher belongs in, is it really helpful to direct them to those departments? Think about that for a second… maybe not! For instance, if a user knows that he really, really wants the “new york chocolate cheese cake with strawberries on top”, then by suggesting that they visit the the Bakery/Cakes you are actually asking the customer to generalize their search. They came to your search box intent upon buying a New York chocolate cheese cake with strawberries on top, and you said “Nah, why don’t you just look at all our cakes. Maybe you’ll find something there.”
So… I think that this usage of clustering is a great example of the power of Carrot2, and I have been impressed with semantic clarity of the tags that Carrot2 provides to the clusters that it finds. However, building a good user experience for Suggest-As-You-Type is not as easy as it might first seem because there are so many different types of customers. For customers that come into your e-commerce site to simply look around, then it’s probably a good idea to suggest departments that they might be interested in. But for more serious customers, you might want to provide suggestions based upon previous customer searches. And for those customers that know exactly what they want, then placing products directly in the Suggest-As-You-Type response is a good idea. Ideally, a good Suggest-As-You-Type user experience would include all three of these aspects.
Published at DZone with permission of John Berryman , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.