DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. Use Cases of Faceted Search for Apache Solr

Use Cases of Faceted Search for Apache Solr

Peter Karussell user avatar by
Peter Karussell
·
Dec. 09, 10 · Interview
Like (0)
Save
Tweet
Share
26.23K Views

Join the DZone community and get the full member experience.

Join For Free

in this post i write about some use cases of facets for apache solr. please submit your own ideas in the comments.
this post is split into the following parts:

  • what are facets?
  • how do you enable and use simple facets?
  • what are other use cases?
    - category navigation
    - autocompletion
    - trending keywords or links
    - rss feeds
  • conclusion


what are facets?

in apache solr, elements for navigational purposes are named facets . keep in mind that solr provides filter queries (specified via http parameter fq) which filter out documents from the search result. in contrast, facet queries only provide information (count of documents) and do not change the result documents, i.e. they provide ‘filter queries for future queries’. so define a facet query and then see how much documents i can expect if i would apply the related filter query.

but a picuture – from this great facet-introduction – is worth a thousand words:

what do you see?

  • you see different facets like manufacturer, resolution, …
  • every facet has some constraints, where the user can filter its search results easily
  • the breadcrumb shows all selected contraints and allows removing them

all these values can be extracted from solrs’ search results and can be defined at query time, which looks surprising if you come from fast esp. nevertheless the fields on which you do faceting needs to be indexed and untokenized. e.g. string or integer. but the type of fields where you want to do faceting mustn’t be the default ‘text’ type, which is tokenized.

in solr you have

  • normal facets used via facet.field
  • facet queries and
  • date facets similar to the new more general
  • range queries

the normal facets can be useful if your documents have a manufacturer string field e.g. a document can be within the ‘sony’ or ‘nikon’ bucket. in contrast you will need facet queries for integers like pricing. for example if you specify a facet query from 0 to 10 eur solr will calculate on the fly all documents which fall into that bucket. but the facet queries becomes relative unhandy if you have several identical ranges like 0-10, 10-20, 20-30, … eur. then you can use range queries.

date facets are special range queries. as an example look into this screenshot from jetwick :

where here the interval (which is called gap) for every bucket is one day.

for a nice introduction into facets have a look into this publication or use the solr wiki here .

how do you enable and use simple facets?

as stated before they can be enabled at query time. for the http api you add “&facet=true&facet.field=manu” to your normal query “http://localhost:8983/solr/select?q=*:*”. for solrj you do:

new solrquery("*:*").setfacet(true).addfacetfield("manu");

in the xml returned from the solr server you will get something like this – again from this post :

<lst name="facet_fields">
<lst name="manu">
<int name="canon usa">17</int>
<int name="olympus">12</int>
<int name="sony">12</int>
<int name="panasonic">9</int>
<int name="nikon">4</int>
</lst>
<pre></lst></pre>

to retrieve this with solrj you don’t need to touch any xml, of course. just get the facet objects:

list<facetfield> facetfields = queryresponse.getfacetfields();

to append facet queries specify them with addfacetquery:

solrquery.addfacetquery("quality:[* to 10]").addfacetquery("quality:[11 to 100]");

and how you would query for documents which does not have a value for that field? this is easy : q=-field_name:[* to *]

now i’ll show you like i implemented date facets in jetwick :

q.setfacet(true).set(“facet.date”, “{!ex=dt}dt”).
set(“facet.date.start”, “now/day-6days”).
set(“facet.date.end”, “now/day+1day”).
set(“facet.date.gap”, “+1day”);

with that query you get 7 day buckets which is visualized via:

it is important to note that you will have to use local parameters like {!ex=dt} to make sure that if a user applies a facet (uses the facet query as filter query) then the other facet queries won’t get a count of 0. in the picture the filter query was fq={!tag=dt}dt:[2010-12-04t00:00:00.000z+to+2010-12-05t00:00:00.000z]. again: filter query needs to start with {!tag=dt} to make that working. take a look into the datefilter source code or this for more information.

be aware that you will have to tune the filtercache in order to keep performance green. it is also important to use warming queries to avoid time outs and pre-fill caches with old ‘heavy’ used data.

what are other use cases?

1. category navigation

the problem: you have a tree of categories and your products are categorized in multiple of those categories.

there are two relative similar solutions for this problem. i will describe one of them:

  • create a multivalued string field called ‘category’. use the category id (or name if you want to avoid db queries).
  • you have a category tree. make sure a document gets not only the leaf category, but all categories until the root node.
  • now facet over the category field with ‘-1′ as limit
  • but what if you want to display only the categories of one level? e.g. if you don’t want other level at a time or if they are too much.
    then index the category field ala <level>_category. for that you will need the complete category tree in ram while indexing. then use facet.prefix=<level>_ to filter the category list for the level
  • clicking on a category entry should result in a filter query ala fq=category:”<levle>_categoryid”
  • the little tricky part is now that your ui or middle tier has to parse the level e.g. 2 and the append 2+1=3 to the query: facet.prefix=3_
  • if you filter the level then one question remains:
    q: how can you display the path from the selected category until the root category?
    a: either get the category parents via db, which is easy if you store the category ids in solr – not the category names.
    or get the parents from the parameter list which is a bit more complicated but doable. in this case you’ll need to store the category names in solr.

please let me know if this explanation makes sense to you or if you want to see that in action – i don’t want to make advertisments for our customers here :-)

btw: the second approach i have in mind is: instead of using facet.prefix you can use dynamic fields ala category_<level>_s

2. autocompletion

the problem: you want to show suggestions as the user types.

you’ll need a multivalued ‘tag’ field. for jetwick i’m using a heavy noise word filter to get only terms ‘with information’ into the tag field, from the very noisy tweet text. if you are using a shingle filter you can even create phrase suggestions. but i will describe the “one more word” suggestion here, which will only suggest the next word (not a complete different phrase).

to do this create a the following query when the user types in some characters (see getquerychoices method of solrtweetsearch ):

  • use the old query with all filter queries etc to provide a context dependent autocomplete (ie. only give suggestions which will lead to results)
  • split the query into “completed” terms and one “to do” term. e.g. if you enter “michael jack”
    then michael is complete (ends with space) and jack should be completed
  • set the query term of the old query to michael and add the facet.prefix=jack
  • set facet limit to 10
  • read the 10 suggestions from facet field but exclude already completed terms.

the implementation for jetwick which uses apache wicket is available in the searchbox source file which uses myautocompletetextfield and the getquerychoices method of solrtweetsearch . but before you implement autocomplete with facets take a look into this documentation . and if you don’t want to use wicket then there is a jquery autocomplete library especially for solr – no ui layer required.

3. trending keywords or links

similar to autocomplete you will need a tag or link field in your index. then use the facet counts as an indicator how important a term is. if you now do a query e.g. solr you will get the trending keywords and links depending on the filters. e.g. you can select different days to see the changes:

the keyword panel is implemented in the tagcloudpanel and the link list is available as urltrendpanel .

of course it would be nice if we would get the accumulated score of every link instead of a simple ‘count’ to prevent spammers from reaching this list. for that, look into this jira issue and into the statscomponent . like i explained in the jira issue this nice feature could be simulated by the results grouping feature.

4. rss feeds

if you log into at jetwick.com you’ll see this idea implemented. every user can have different saved searches . for example i have one search for ‘apache solr’ and one for ‘wikileaks’. every search could contain additional filters like only german language or sort against retweets. now the task is to transform that query into a facet query:

  • insert and’s between the query and all the filter query
  • remove all date filters
  • add one date filter with the date of the last processed search (‘last date’)

then you will see how many new tweets are available for every saved searches:

update : no need to click refresh to see the counts. the count-update is done in background via javascript.

conclusion

there are a lot of applications for faceted search. it is very convinient to use them. okay, the ‘local parameter hack’ is a bit daunting, but hey: it works :-)

it is nice that i can specify different facets for every query in solr, with that feature you can generate personalized facets like it was explained under “rss feeds”.

one improvement for the facets implemented in solr could be a feature which does not calculate the count. instead it sums up a fielda for documents with the same value in fieldb or even returns the score for a facet or a facet query. to improve the use case “trending keywords or links”.

from http://karussell.wordpress.com/2010/12/08/use-cases-of-faceted-search-for-apache-solr/

Database Filter (software) Faceted search Apache Solr code style

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Build a Spring Boot GraalVM Image
  • Utilizing Database Hooks Like a Pro in Node.js
  • Isolating Noisy Neighbors in Distributed Systems: The Power of Shuffle-Sharding
  • Kubernetes-Native Development With Quarkus and Eclipse JKube

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: