Hierarchical faceting – Pivot facets in trunk
Join the DZone community and get the full member experience.
Join For Free
in a large number of implementations which i took part in, sooner or
later, the question arise – what can we do to get faceting as a tree
structure. of course there some tricks for that, however, their use was
to modify the data and appropriate processing of the results on
application side. it was not particularly functional, nor especially
comfortable. however, a few days ago solr version 4.0 has been enhanced
with code that is marked as
solr-792
in the system jira. let’s see in this case, how to get the faceting results as a tree.
important note – at this point this
functionality is only available in version 4.0, solr, which is the
development version. to use this version you need to download the code
from trunk of lucene/solr svn repository.
a few words at the beginning
in many projects in which i had the opportunity to deal with there
was a need to use a hierarchical faceting. one of the simplest example
is the requirement of showing the cities in the provinces and the
number of documents in both provinces, as well as in various cities.
till recently, with no changes in the structure of data, it was
impossible to achieve such functionality. now it is possible
indexing
in order not to unnecessarily complicate the described functionality i decided to use the sample xml documents that are available in the directory /exampledocs of the example deployment. i also didn’t modify the schema.xml file, or solrconfig.xml , so that configurations are standard. so thats all when it comes to configuration. so we can start the indexing process (i called the command from the directory $solr_home/exampledocs/ ):
./post.sh *.xml |
after seeing several screens of information , and we have our data indexed.
the mechanism
it is not difficult to use hierarchical faceting. solr creators gave
us to use two additional parameters to the ones we already know:
- facet.pivot – list of comma-separated fields, which shows at which fields and in what order to calculate the structure,
- facet.pivot.mincount – the minimum number of documents there needs to be to the result to be included in faceting results. the default value is 1.
so let’s try it.
queries
at the beginning of the try with two fields. i query for all the documents from the index and add the parameter facet.pivot=cat,instock to say solr that i want to get the results of the hierarchical faceting, where the first level of the hierarchy is the cat field, and the second level is the instock field. the query looks as follows:
to shorten the listing i omitted the part responsible for the search results along with a header.
<?xml version="1.0" encoding="utf-8"?> <response> . . . <result name="response" numfound="19" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"/> <lst name="facet_dates"/> <lst name="facet_ranges"/> <lst name="facet_pivot"> <arr name="cat,instock"> <lst> <str name="field">cat</str> <str name="value">electronics</str> <int name="count">17</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">13</int> </lst> <lst> <str name="field">instock</str> <bool name="value">false</bool> <int name="count">4</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">memory</str> <int name="count">6</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">6</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">connector</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">false</bool> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">graphics card</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">false</bool> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">hard drive</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">monitor</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">search</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">cat</str> <str name="value">software</str> <int name="count">2</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">2</int> </lst> </arr> </lst> </arr> </lst> </lst> </response>
the presentation of faceting results has changed in this case. for each of the main level we have the markers defining the field (the tag with the attribute name=”field” ), value (the tag with the attribute name=”value” ) and the number of documents (the tag with the attribute name=”count” ). next there is the the second level hierarchy (tag with the attribute name=”pivot” ). the second level contains the same elements as the first level – name, value and the number of documents with a given value.
let’s see how this mechanism can deal with more levels of depth. to check that i run the following query:
http://localhost:8983/solr/select/?q=*:*&facet=true&facet.pivot=cat,instock,features
i omitted the response header with the results, leaving the faceting results only. in addition, due to the length of the faceting results i only show one level one level faceting:
<?xml version="1.0" encoding="utf-8"?> <response> . . . <result name="response" numfound="19" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"/> <lst name="facet_dates"/> <lst name="facet_ranges"/> <lst name="facet_pivot"> <arr name="cat,instock,features"> <lst> <str name="field">cat</str> <str name="value">electronics</str> <int name="count">17</int> <arr name="pivot"> <lst> <str name="field">instock</str> <bool name="value">true</bool> <int name="count">13</int> <arr name="pivot"> <lst> <str name="field">features</str> <str name="value">2</str> <int name="count">7</int> </lst> <lst> <str name="field">features</str> <str name="value">3</str> <int name="count">7</int> </lst> <lst> <str name="field">features</str> <str name="value">lcd</str> <int name="count">5</int> </lst> <lst> <str name="field">features</str> <str name="value">x</str> <int name="count">5</int> </lst> <lst> <str name="field">features</str> <str name="value">ca</str> <int name="count">4</int> </lst> <lst> <str name="field">features</str> <str name="value">latenc</str> <int name="count">4</int> </lst> <lst> <str name="field">features</str> <str name="value">tft</str> <int name="count">4</int> </lst> <lst> <str name="field">features</str> <str name="value">v</str> <int name="count">4</int> </lst> <lst> <str name="field">features</str> <str name="value">0</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">1</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">25</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">30</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">5</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">7</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">8</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">time</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">up</str> <int name="count">3</int> </lst> <lst> <str name="field">features</str> <str name="value">000</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">19</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">20</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">2336</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">27</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">275</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">6</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">75</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">activ</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">built</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">cach</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">color</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">flash</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">heat</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">heatspread</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">matrix</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">mb</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">ms</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">photo</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">resolut</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">seek</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">speed</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">spreader</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">unbuff</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">usb</str> <int name="count">2</int> </lst> </arr> </lst> <lst> <str name="field">instock</str> <bool name="value">false</bool> <int name="count">4</int> <arr name="pivot"> <lst> <str name="field">features</str> <str name="value">0</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">1</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">16</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">2</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">20</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">3</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">9</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">90</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">adapt</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">car</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">clock</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">direct</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">directx</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">dual</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">dvi</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">express</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">gddr</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">ghz</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">gl</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">gpu</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">gpuvpu</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">hdtv</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">mb</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">mhz</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">open</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">opengl</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">out</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">pci</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">power</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">vpu</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">white</str> <int name="count">2</int> </lst> <lst> <str name="field">features</str> <str name="value">x</str> <int name="count">2</int> </lst> </arr> </lst> </arr> </lst> </arr> </lst> </lst> </response>
as shown in the example, also in this case solr had no problems with the correct calculation of the hierarchy. the above example is almost the same, in the context of data available, as the previous example, it only contains one more level of depth.
a few words at the end
in my opinion this is one of the more useful features for “ ordinary ” user. unfortunately, so far only available in development version of solr. i have not found any information about whether it is planned to transfer this functionality to version 1.5 of solr, which is named branch_3x branch in svn. however, it is important that this functionality was commited, and sooner or later solr users will be able to use it.
Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Never Use Credentials in a CI/CD Pipeline Again
-
MLOps: Definition, Importance, and Implementation
-
Alpha Testing Tutorial: A Comprehensive Guide With Best Practices
-
Health Check Response Format for HTTP APIs
Comments