Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Inside the Apache Solr JSON Facet API

DZone's Guide to

Inside the Apache Solr JSON Facet API

Solr 5 includes a re-written faceted search and analytics module with a structured JSON API to control the faceting and analytics commands. Here’s how it works.

· Integration Zone
Free Resource

Share, secure, distribute, control, and monetize your APIs with the platform built with performance, time-to-value, and growth in mind. Free 90-day trial of 3Scale by Red Hat

Since I joined Cloudera a few years ago to help bring search-powered analytics to Cloudera’s platform, I’ve been working actively upstream alongside the rest of the Solr community to develop new functionality that will drive more interesting applications on Cloudera Search (which is based on an integration of Solr with the Apache Hadoop ecosystem). In the following re-post from my personal blog, I describe one of these features — improved support for nested facets via JSON — that I wrote at the time of code check-in. (Note: this feature is targeted for a future release of Cloudera Enterprise, and thus is not yet supported for production use.)

Why JSON?

The structured nature of nested sub-facets is more naturally expressed in a nested structure like JSON rather than the flat structure that normal query parameters provide. For that reason, starting in 5.0, Solr includes a JSON Facet API. The Facet API is now part of the JSON Request API, so a complete request may be expressed in JSON.

Goals of the new faceting module include:

  • First-class JSON support
  • Easier programmatic construction of complex, nested facet commands
  • Support for a much more canonical response format that is easier for clients to parse
  • First-class analytics support
  • Ability to sort facet buckets by any calculated metric
  • A cleaner way to do distributed faceting
  • Better integration with other search features

Of course, if you prefer to use Solr’s existing faceting capabilities, that’s fine, too. (You can even use both simultaneously if you want to!)

Next, let’s get into the details. (Note: Some examples here use syntax supported only in later Solr 5 releases, or even Solr 6.)

Ease of Use

Some of the ease-of-use enhancements over traditional Solr faceting come from the inherently nested structure of JSON.

As an example, here is the faceting command for two different range facets usingSolr’s Flat API:

&facet=true
&facet.range={!key=age_ranges}age
&f.age.facet.range.start=0
&f.age.facet.range.end=100
&f.age.facet.range.gap=10
&facet.range={!key=price_ranges}price
&f.price.facet.range.start=0
&f.price.facet.range.end=1000
&f.price.facet.range.gap=50

And here is the equivalent faceting command in the new JSON Faceting API:

age_ranges:{
type:range
field:age,
start:0,
end:100,
gap:10
price_ranges:{
type:range
field:price,
start:0,
end:1000,
gap:50

These aren’t even nested facets, but already, one can see how much nicer the JSON API looks. With deeply nested sub-facets and statistics, the clarity of the inherently nested JSON API only grows.

JSON Extensions

A number of JSON extensions have been implemented to further increase the clarity and ease of constructing a JSON faceting command by hand. For example:

{// this is a single-line comment, which can help add clarity to large JSON commands
/* traditional C-style comments are also supported */
x:"avg(price)",// Simple strings can occur unquoted
y:'unique(manu)'// Strings can also use single quotes (easier to embed in another String)

Debugging JSON

Nicely-indented JSON is very easy to understand. If you get a large piece of non-indented JSON somehow and are trying to make sense of it, you can cut and paste into an online validator like JSON Lint or JSON Formatter

Both of these validators will indent your JSON, even when it contains extensions unsupported by them (such as comments or bare strings).

Facet Types

There are two types of facets: one that breaks up the domain into multiple buckets, and aggregations or facet functions that provide information about the set of documents belonging to each bucket.

Faceting can be nested. Any bucket produced by faceting can further be broken down into multiple buckets by a subfacet.

Statistics Are Facets

Statistics are now fully integrated into faceting. Since we start off with a single facet bucket with a domain defined by the main query and filters, we can even ask for statistics for this top-level bucket, before breaking up into further buckets via faceting. Example:

json.facet={
x:"avg(price)",// the average of the price field will appear under "x"
y:"unique(manufacturer)"// the number of unique manufacturers will appear under "y"

See facet functions for a complete list of the available aggregation functions.

JSON Facet Syntax

The general form of the JSON facet commands are:

<facet_name>:{<facet_type>:<facet_parameter(s)>}

Example:

top_authors:{terms:{field:authors,limit:5}}

After Solr 5.2, a flatter structure with a “type” field may also be used:

<facet_name>:{"type":<facet_type>,<other_facet_parameter(s)>}

Example:

top_authors:{type:terms,field:authors,limit:5}

The results will appear in the response under the facet name specified. Facet commands are specified using json.facet request parameters.

Test Using Curl

To test out different facet requests by hand, it’s easiest to use curl from the command line. Example:

$curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&

json.facet={

categories:{

type:terms,

field:cat,

sort:{x:desc},

facet:{

x:"avg(price)",

y:"sum(price)"

Terms Facet

The termsfacet, or field facet, produces buckets from the unique values of a field. The field needs to be indexed or have docValues.

The simplest form of the terms facet:

top_genres:{terms:genre_field}

An expanded form allows for more parameters:

top_genres:{
type:terms,
field:genre_field,
limit:3,
mincount:2

Example response:

"top_genres":{
"buckets":[
"val":"Science Fiction",
"count":143},
"val":"Fantasy",
"count":122},
"val":"Biography",
"count":28}

Parameters:

solr-json-tab1

Query Facet

The query facet produces a single bucket that matches the specified query.

Here’s an example of the simplest form of the query facet:

high_popularity:{query:"popularity:[8 TO 10]"}

An expanded form allows for more parameters (or sub-facets/facet functions):

high_popularity:{
type:query,
q:"popularity:[8 TO 10]",
facet:{average_price:"avg(price)"}

Example response:

"high_popularity":{
"count":147,
"average_price":74.25

Range Facet

The range facet produces multiple range buckets over numeric fields or date fields.

Range facet example:

prices:{
type:range,
field:price,
start:0,
end:100,
gap:20

Example response:

"prices":{
"buckets":[
"val":0.0,// the bucket value represents the start of each range. This bucket covers 0-20
"count":5},
"val":20.0,
"count":3},
"val":40.0,
"count":2},
"val":60.0,
"count":1},
"val":80.0,
"count":1}

To ease migration, these parameter names, values, and semantics were taken directly from the old-style (non-JSON) Solr range faceting.

Parameters:

solr-json-tab2

Common Parameters

Parameters that all faceting methods have in common include:

Conclusion

Hopefully, you now have a good understanding of the JSON API introduced in Solr 5. Again, this feature is scheduled to ship/be certified in a future Cloudera release but is not yet supported for production use.

Yonik Seeley is a Software Engineer at Cloudera, a committer and PMC member for Apache Lucene, and the creator of Solr. Previously, he was chief open source architect and cofounder at LucidWorks.

Explore the core elements of owning an API strategy and best practices for effective API programs. Download the API Owner's Manual, brought to you by 3Scale by Red Hat

Topics:
json ,api ,solr ,integration ,apache

Published at DZone with permission of Yonik Seeley. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}