Autocomplete on Multivalued Fields Using Faceting
Autocomplete on Multivalued Fields Using Faceting
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
In the previous blog post about auto complete on multi-valued field we discussed how highlighting can help us get the information we are interested in. We also promised that we will get back to the topic and we will show how to achieve a similar functionality with the use of Solr faceting capabilities. So, let’s do it.
Before we start
Because this post is more or less a continuation of what we’ve wrote earlier about autocomplete on multi-valued fields we recommend to read the “Autocomplete on multivalued field using highlighting” before reading the rest of this entry. We would also like to note, that the method shown in this entry is very similar to the one shown in the “Solr and autocomplete (part 1)” post, but we wanted to refresh that topic and show the example using multi-valued fields.
Similar to the previous post we will start with Solr configuration.
The structure of our index is exactly the same as the one previously shown, but let’s recall it. One thing – please remember that we want to have auto complete working on multi-valued field. This field is called features and the whole index fields configuration looks like this:
<fields> <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="features" type="string" indexed="true" stored="true" multiValued="true"/> <field name="features_autocomplete" type="text_autocomplete" indexed="true" stored="true" multiValued="true"/> <field name="_version_" type="long" indexed="true" stored="true"/> </fields>
For getting values for auto complete we will use the features_autocomplete field.
Of course we don’t want to change our indexer and we want Solr to automatically copy the data from features field to the features_autocomplete one. Because of that we will add the copyField definition to the schema.xml file, so it looks like this:
<copyField source="features" dest="features_autocomplete"/>
Our text_autocomplete field type
And we’ve come to the first difference – the text_autocomplete field type. This time it looks like this:
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Because of the fact that we will use faceting we use the solr.KeywordTokenizerFactory with thesolr.LowerCaseFilterFactory to have the data in our field as a single, lowercased token.
Our example data is identical to what we had before, but even though let’s recall them for things to be clear:
<add> <doc> <field name="id">1</field> <field name="features">Multiple windows</field> <field name="features">Single door</field> </doc> <doc> <field name="id">2</field> <field name="features">Single window</field> <field name="features">Single door</field> </doc> <doc> <field name="id">3</field> <field name="features">Multiple windows</field> <field name="features">Multiple doors</field> </doc> </add>
Query with faceting
Let’s look how our query will look like when we will use faceting instead of highlighting.
When using faceting our query should look more or less like the following one:
A few words about the parameters:
- rows=0 – we tell Solr that we don’t want the documents that matched the query in the results,
- facet=true – we inform Solr that we want to use faceting,
- facet.field=features_autocomplete – we say which field will be used to calculate faceting,
- facet.prefix=sing – with the use of this parameter we provide the value of a query for auto complete.
Query results returned by Solr for the above query are as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> <lst name="params"> <str name="facet">true</str> <str name="q">*:*</str> <str name="facet.prefix">sing</str> <str name="facet.field">features_autocomplete</str> <str name="rows">0</str> </lst> </lst> <result name="response" numFound="3" start="0"> </result> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="features_autocomplete"> <int name="single door">2</int> <int name="single window">1</int> </lst> </lst> <lst name="facet_dates"/> <lst name="facet_ranges"/> </lst> </response>
As you can see in the field faceting section we got the phrases we were interested in along with the number of documents they appear in.
What to remember about
The crucial thing to remember is that the value provided to the facet.prefix parameter is not analyzed. Because of that if we would provide the Sing value instead of the singwe wouldn’t get the results. You should remember that.
A short summary
The above entry shown the second method used to develop auto complete functionality on multi-valued fields. Of couse we didn’t say all about the topic and we will get back to it someday, but for now that is all. We hope that someone will find it useful
Published at DZone with permission of Rafał Kuć , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.