DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Unlock AI Power: Generate JSON With GPT-4 and Node.js for Ultimate App Integration
  • Custom Elements Manifest: The Key to Seamless Web Component Discovery and Documentation
  • Instant App Backends With API and Logic Automation
  • Migrating MuleSoft System API to AWS Lambda (Part 1)

Trending

  • Vibe Coding With GitHub Copilot: Optimizing API Performance in Fintech Microservices
  • A Simple, Convenience Package for the Azure Cosmos DB Go SDK
  • A Modern Stack for Building Scalable Systems
  • Agile and Quality Engineering: A Holistic Perspective
  1. DZone
  2. Data Engineering
  3. Databases
  4. Inside the Apache Solr JSON Facet API

Inside the Apache Solr JSON Facet API

Solr 5 includes a re-written faceted search and analytics module with a structured JSON API to control the faceting and analytics commands. Here’s how it works.

By 
Yonik Seeley user avatar
Yonik Seeley
·
Oct. 27, 16 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
7.9K Views

Join the DZone community and get the full member experience.

Join For Free

Since I joined Cloudera a few years ago to help bring search-powered analytics to Cloudera’s platform, I’ve been working actively upstream alongside the rest of the Solr community to develop new functionality that will drive more interesting applications on Cloudera Search (which is based on an integration of Solr with the Apache Hadoop ecosystem). In the following re-post from my personal blog, I describe one of these features — improved support for nested facets via JSON — that I wrote at the time of code check-in. (Note: this feature is targeted for a future release of Cloudera Enterprise, and thus is not yet supported for production use.)

Why JSON?

The structured nature of nested sub-facets is more naturally expressed in a nested structure like JSON rather than the flat structure that normal query parameters provide. For that reason, starting in 5.0, Solr includes a JSON Facet API. The Facet API is now part of the JSON Request API, so a complete request may be expressed in JSON.

Goals of the new faceting module include:

  • First-class JSON support
  • Easier programmatic construction of complex, nested facet commands
  • Support for a much more canonical response format that is easier for clients to parse
  • First-class analytics support
  • Ability to sort facet buckets by any calculated metric
  • A cleaner way to do distributed faceting
  • Better integration with other search features

Of course, if you prefer to use Solr’s existing faceting capabilities, that’s fine, too. (You can even use both simultaneously if you want to!)

Next, let’s get into the details. (Note: Some examples here use syntax supported only in later Solr 5 releases, or even Solr 6.)

Ease of Use

Some of the ease-of-use enhancements over traditional Solr faceting come from the inherently nested structure of JSON.

As an example, here is the faceting command for two different range facets usingSolr’s Flat API:

&facet=true
&facet.range={!key=age_ranges}age
&f.age.facet.range.start=0
&f.age.facet.range.end=100
&f.age.facet.range.gap=10
&facet.range={!key=price_ranges}price
&f.price.facet.range.start=0
&f.price.facet.range.end=1000
&f.price.facet.range.gap=50

And here is the equivalent faceting command in the new JSON Faceting API:

age_ranges:{
type:range
field:age,
start:0,
end:100,
gap:10
price_ranges:{
type:range
field:price,
start:0,
end:1000,
gap:50

These aren’t even nested facets, but already, one can see how much nicer the JSON API looks. With deeply nested sub-facets and statistics, the clarity of the inherently nested JSON API only grows.

JSON Extensions

A number of JSON extensions have been implemented to further increase the clarity and ease of constructing a JSON faceting command by hand. For example:

{// this is a single-line comment, which can help add clarity to large JSON commands
/* traditional C-style comments are also supported */
x:"avg(price)",// Simple strings can occur unquoted
y:'unique(manu)'// Strings can also use single quotes (easier to embed in another String)

Debugging JSON

Nicely-indented JSON is very easy to understand. If you get a large piece of non-indented JSON somehow and are trying to make sense of it, you can cut and paste into an online validator like JSON Lint or JSON Formatter. 

Both of these validators will indent your JSON, even when it contains extensions unsupported by them (such as comments or bare strings).

Facet Types

There are two types of facets: one that breaks up the domain into multiple buckets, and aggregations or facet functions that provide information about the set of documents belonging to each bucket.

Faceting can be nested. Any bucket produced by faceting can further be broken down into multiple buckets by a subfacet.

Statistics Are Facets

Statistics are now fully integrated into faceting. Since we start off with a single facet bucket with a domain defined by the main query and filters, we can even ask for statistics for this top-level bucket, before breaking up into further buckets via faceting. Example:

json.facet={
x:"avg(price)",// the average of the price field will appear under "x"
y:"unique(manufacturer)"// the number of unique manufacturers will appear under "y"

See facet functions for a complete list of the available aggregation functions.

JSON Facet Syntax

The general form of the JSON facet commands are:

<facet_name>:{<facet_type>:<facet_parameter(s)>}

Example:

top_authors:{terms:{field:authors,limit:5}}

After Solr 5.2, a flatter structure with a “type” field may also be used:

<facet_name>:{"type":<facet_type>,<other_facet_parameter(s)>}

Example:

top_authors:{type:terms,field:authors,limit:5}

The results will appear in the response under the facet name specified. Facet commands are specified using json.facet request parameters.

Test Using Curl

To test out different facet requests by hand, it’s easiest to use curl from the command line. Example:

$curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&

json.facet={

categories:{

type:terms,

field:cat,

sort:{x:desc},

facet:{

x:"avg(price)",

y:"sum(price)"

Terms Facet

The termsfacet, or field facet, produces buckets from the unique values of a field. The field needs to be indexed or have docValues.

The simplest form of the terms facet:

top_genres:{terms:genre_field}

An expanded form allows for more parameters:

top_genres:{
type:terms,
field:genre_field,
limit:3,
mincount:2

Example response:

"top_genres":{
"buckets":[
"val":"Science Fiction",
"count":143},
"val":"Fantasy",
"count":122},
"val":"Biography",
"count":28}

Parameters:

solr-json-tab1

Query Facet

The query facet produces a single bucket that matches the specified query.

Here’s an example of the simplest form of the query facet:

high_popularity:{query:"popularity:[8 TO 10]"}

An expanded form allows for more parameters (or sub-facets/facet functions):

high_popularity:{
type:query,
q:"popularity:[8 TO 10]",
facet:{average_price:"avg(price)"}

Example response:

"high_popularity":{
"count":147,
"average_price":74.25

Range Facet

The range facet produces multiple range buckets over numeric fields or date fields.

Range facet example:

prices:{
type:range,
field:price,
start:0,
end:100,
gap:20

Example response:

"prices":{
"buckets":[
"val":0.0,// the bucket value represents the start of each range. This bucket covers 0-20
"count":5},
"val":20.0,
"count":3},
"val":40.0,
"count":2},
"val":60.0,
"count":1},
"val":80.0,
"count":1}

To ease migration, these parameter names, values, and semantics were taken directly from the old-style (non-JSON) Solr range faceting.

Parameters:

solr-json-tab2

Common Parameters

Parameters that all faceting methods have in common include:

  • domain: facet domain transformations, to change the incoming domain of the facet command before faceting is executed. This is useful for multi-select faceting and nested document (block join) faceting.

Conclusion

Hopefully, you now have a good understanding of the JSON API introduced in Solr 5. Again, this feature is scheduled to ship/be certified in a future Cloudera release but is not yet supported for production use.

Yonik Seeley is a Software Engineer at Cloudera, a committer and PMC member for Apache Lucene, and the creator of Solr. Previously, he was chief open source architect and cofounder at LucidWorks.

JSON API Apache Solr

Published at DZone with permission of Yonik Seeley. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Unlock AI Power: Generate JSON With GPT-4 and Node.js for Ultimate App Integration
  • Custom Elements Manifest: The Key to Seamless Web Component Discovery and Documentation
  • Instant App Backends With API and Logic Automation
  • Migrating MuleSoft System API to AWS Lambda (Part 1)

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!