Designing Faceted Search: Getting the basics right (part 2)
In our last post we looked at some of the fundamental issues in designing faceted search such as layout (e.g. where to place the faceted navigation menus) and state (e.g. whether they should be open or closed by default). In this post, we continue the mini-series with a review of the various formats for displaying facets and the key principles for choosing between them.
2. Fundamental Principles
2.1 Facets and Search
Before we get into the detail of display formats, we should first establish some basic terminology. Facets are essentially independent properties or dimensions
by which we can classify an object. For example, a book might be
classified using an Author facet, a Subject facet, a Date facet, and so
on. Faceted search enables users to intuitively explore
information spaces by progressively refining their choices in each
dimension. So for example, we could explore a collection of books by
selecting a specific Author, Subject, or Date range, and so on.
Selections are made by applying facet values which update the navigational context,
i.e. the user’s current location in the information space. The
navigational context then defines the set of records returned for a
given query, and the facet values that are applicable to that result
set. This leads us to our first principle:
- Principle 1: Display only currently available facet values.
By observing this principle, we provide a search experience that guides users toward meaningful navigational choices and avoids the possibility of zero results. (Note that there are exceptions to this such as the use of Smart Dead Ends, which we’ll discuss in a later post).
2.2 Facet Semantics
Facets can be either single-select or multi-select. In the former case, the facet values are assumed to be mutually exclusive, i.e. only one may be applied at any given time. For example, a given copy of book may be assumed to have only one location: if it is in Library X, then by definition it cannot be simultaneously in Libraries Y or Z. This facet is therefore single-select. Conversely, some facets represent values which are *not* mutually exclusive, i.e. more than one may apply at any given time. For example, a given book may have more than one Author: if it is co-edited by Professor X, then it could also be simultaneously co-edited by Professors Y and Z. This facet is therefore multi-select.
Multi-select facets can either be multi-select OR or multi-select AND. In the first case (OR), we assume that the values are combined disjunctively, e.g. a given book may have been published in either 2001, 2002 OR 2003. In the second case (AND), we assume that the values are combined conjunctively, e.g. a given book may have been co-authored by Professors X, Y AND Z. Multi-select AND tends to be somewhat rarer in faceted search, as it implies that selected facet values only make sense when applied in their totality. A typical example is purchasing a car: if the user specifies that they are looking for features A, B and C, the assumption is that they want ALL of these features to be present on EVERY record (e.g. air con AND sat nav AND sunroof), rather than just a subset of them (e.g. air con OR sat nav OR sunroof). Strictly speaking, if we accept the definition of facets as comprising mutually exclusive attributes, then a multi-select AND facet is actually a collection of individual Boolean facets that are grouped together for pragmatic reasons.
2.3 Facet States and Behaviours
By convention, values applied across different facets are normally applied conjunctively (e.g. Author A and Subject B and Date C) whereas values applied within a given facet are normally applied disjunctively (Date X or Date Y or Date Z). Facet values can be removed as well as applied. There are various ways to achieve this, but the principle is the same: the user deselects an applied value and the navigational context is updated accordingly.
Facets change their state and available values to reflect the current navigational context. For example, if the user selects a particular Location, then the other values for Location are no longer applicable to the current result set (since this facet is single-select). In some faceted search applications, this facet would be ‘refined away’, i.e. removed from display since there are no further choices to be applied. (Of course, if the user changes the navigational context by removing the applied value, the facet should re-appear.)
Facets can also enter a kind of ‘passive’ state indirectly. For example, the values applied in Facet A may result in the available values in Facet B being reduced to a singleton. In our library example, if we select books authored by Professor Smith, we may discover that the only applicable Date is 2002. Applying that single value would not change the navigational context as it applies equally to all records in the current result set. In some faceted search applications, facets in this state are also removed from display.
3. Displaying facet values
We discussed above how facets are essentially independent properties
by which we can classify an object. Conceptually, each of these facets
is based on an underlying data type, e.g. in our book collection, the values for Author could be stored as text (i.e. character strings), the values for Subject could be one or more terms from a controlled vocabulary, the values for Date could be stored as long integers,
and so on. Such decisions shape the underlying architecture of faceted
search applications. But how should such facets be displayed to the end
user, and what kind of interactivity should be provided? Ideally, the
display format should reflect and communicate the nature of the underlying data. This leads us to our second principle:
- Principle 2: Match the display format to the semantics of the facet values.
Facets can be used to express a wide variety of data types, and consequently there is a wide variety of available display formats. In the following section, we examine some of the main options.
Hyperlinks are probably the most common technique for representing facet values. They provide a simple and direct mechanism for representing textual values, and afford interaction through direct selection (e.g. via a mouse click). The example below from Food Network shows a typical faceted navigation menu, with the facets arranged in a vertical stack configuration. Each of the ten facets contains values that are essentially textual in nature, so as such are ideal for display using hyperlinks.
One of the reasons for the popularity of hyperlinks is their simple interaction model: the user selects a value, and the system responds by applying that value as a refinement to the current navigational context. So in the example above, if the user selects Content Type=Recipe, the result set is updated to include only recipes (and the breadbox is updated to show this selection). Likewise, if the user selects Course=Main Dish, the results are updated to include only recipes for main dishes.
This simple example also illustrates the one of the design behaviours mentioned above: single select facets are ‘refined away’ after being applied. For example, once a value for Content Type has been applied, the facet disappears. Likewise, once a value for Course has been selected, the facet disappears. The end result is that the faceted menu shrinks as further refinements are applied.
Note that multi-select facets require a different behaviour: in this case, the assumption is that more than one value from each facet may be simultaneously applied, so the individual facets need to remain visible for further selections to be made by the user.
This raises an interesting question: if Food Network did want to make their existing facets multi-selectable (and there is little inherent in their semantics to preclude such a possibility), could they continue to use the hyperlink format and simply continue to display each facet following a selection? There is no technical reason why this cannot be done – in fact, we see such an example at NCSU libraries:
In this example, the user can select multiple Subjects, Genres, Formats, and so on, and these are added to the navigational context each time.
But there are two reasons why hyperlinks are not the ideal display format for multi-select facets. First, there is the issue that they are by convention typically used to display single-select facet values, and offer a strong affordance of single-select behaviour. Secondly, there exists an alternative display mechanism that is specifically designed to support multiple selections from a number of options: the checkbox.
Checkboxes are an ideal format for the display of multi-select facets. An example of their use of can be found at many sites, including eBay:
Checkboxes support multiple selection and the communication of navigational state through inline breadcrumbs. In this example, the user can select multiple models (e.g. Golf OR Jetta) and multiple model years (2009 OR 2008 OR 2007) and these choices are displayed inline as selected checkboxes.
A discussion of the negative consequences of using hyperlinks and checkboxes inappropriately can be found at UXMatters. However, this article itself conflates the issue of facet semantics (single-select vs. multi-select) with that of display mechanism (hyperlinks vs. checkboxes). Although there are dependencies between these two issues, they are in fact separate: the first reflects the conceptual data model, the second the desired user experience. A more informed design approach would recognize this distinction.
3.3 Range Sliders
In the examples above, the facet values are essentially categorical in nature, i.e. qualitative data organised on a nominal or ordinal scale. But facets often need to display quantitative data, such as price ranges, product sizes, date ranges and so on. In such cases a range slider is often a more suitable display mechanism. An example of their use can be found at Molecular’s Wine Store:
This example shows the use of sliders for quantitative data such as of Price, Expert Score, User Rating and for interval data such as Vintage. Note also that this example uses single ended sliders for the first three but double-ended for the latter. The rationale here is that most users would only be interested a maximum value for price, or a minimum value for Expert Score and User Rating. Conversely, they may be interested in both a start date and an end date for a particular range of vintages (using both a maximum and a minimum).
This example also illustrates how the basic slider can be overlaid with supplementary information such as a histogram showing the distribution of record counts across the range. This helps the user understand the overall information space by providing them with a global view of the ‘landscape’ within each facet, guiding them toward more meaningful and productive selections.
3.4 Input Boxes
One of the disadvantages of sliders is that that they offer a relatively coarse level of control over the values. In the Molecular example, it may be relatively easy to specify a precise pair of dates, as the entire scale spans just 20 years or so (1986 to 2006). However, quantitative values can of course extend over much greater intervals, in which case sliders start to become cumbersome and awkward to use accurately. Consequently, sliders often appear paired with input boxes, which allow the direct entry of arbitrary values along a quantitative range. An example of this can be seen at Glimpse.com, in which a query for ‘shoes’ returns results across a relatively wide price range:
With a range as wide as $15 to $470, specifying a precise value much easier using the input boxes than using the slider.
Sometimes it is appropriate to transform the semantics of one data type into another. For example, quantitative data can be transformed into an interval scale by subdividing the range into a sequence of smaller ranges and giving each a label. This is the approach taken by Amazon in their treatment of price ranges:
Note that the intervals need not be of equal size: in Amazon’s case, they divide the overall range into five ‘bins’ of differing size, in what may be a strategy to smooth the distribution of record counts within each range (we’ll discuss the issue of displaying bin counts in a later post). Alternatively, this may simply be a reflection of the price points found to be uppermost in shopper’s minds when they browse these particular products.
3.5 Colour Pickers
In the above examples we’ve discussed how the choice of display format is shaped by the semantics of the underlying data and the user experience we wish to provide. In this respect, colour pickers are perhaps the ultimate custom control: they are designed exclusively to represent the visual dimension of colour. However, there are various ways in which this can be executed. Littlewoods, for example, displays a colour swatch with associated text labels, and shows only values that are specific to the current result set:
Artist Rising, by contrast, uses a generic colour picker, offering a choice across a continuous colour spectrum:
Although the user has great flexibility in selecting a precise colour value, the corollary is that it is easy for the user to select an illegal value (i.e. one which is not available within the current result set). In this respect, this approach violates Principle 1 above.
An alternative to the colour picker is to simply use text labels for each value, and this approach is surprisingly common (being widely used by both Amazon and eBay). The challenge is, of course, to select textual labels that are going to be meaningful to the end user and also faithfully represent the appearance of the item itself. Inevitably, once the range extends beyond basic primary and secondary colours, such mappings start to become increasingly tenuous.
3.6 Tag Clouds
A decade or so ago, there was no such thing as a tag cloud – at least, not outside of a few research labs and data visualization projects. Then along came Flickr, Delicious and a host of other online community sites with vast repositories of user-generated and user-tagged content. Tag clouds, with their ability to represent measures such as tag frequency and popularity in a visually appealing manner, rapidly became the standard technique for displaying and exploring such content. Soon, their use was extended to include unstructured content, displaying clouds of terms extracted from text documents.
A decade on, we seem to have come full circle. Perhaps victims of their own popularity, tag clouds are becoming an increasingly rare part of the faceted search experience. One of the few remaining examples can be found at Artist Rising, which displays a tag cloud as a dialog overlay within a horizontal faceted menu:
As tags are selected, they are added to the breadbox alongside other refinements. (Unusually, the tag cloud in this implementation appears to allow only a single value to be applied, which rather weakens their essential value.)
A somewhat different treatment can be found at PC Authority, which uses tag clouds to present terms extracted from unstructured content (i.e. text documents). These are displayed in a separate container which is disconnected (conceptually and physically) from the left hand faceted navigation menu. As tags are selected, they are added to the breadbox alongside other refinements:
Unlike Artist Rising, the tag cloud in this implementation supports multiple refinements, updating its own contents on each iteration.
3.7 Data Visualizations
In each of the above examples the records have been considered as unique entities, i.e. the goal of the search experience is locating and then viewing one or more individual records. In this context, the role of the facet values is primarily to facilitate that process; to smooth the journey from initial query to product record. But an increasing number of applications are concerned with understanding patterns inherent in the collection at a much higher level, where the focus is not on locating individual records but on understanding patterns of distribution and occurrence at an aggregate level. In these applications, the facets play a much more central role in the discovery experience, with the focus shifting from findablity to broader tasks such as analysis, sensemaking and discovery-oriented problem solving.
Applications such as this are designed to aggregate, organize, and summarize data from a variety of quantitative and qualitative sources, using data visualizations to communicate key metrics, patterns, and overall status. An example of such a visualization could be found at Newssift (illustrated below, but since closed), in which pie charts were used to communicate the distribution of Article Sources and associated Sentiment:
In common with hyperlinks, these visualizations provide a simple interaction model: the user selects a value (e.g. by mouse click), and the system responds by applying that value as a refinement to the current navigational context. But more importantly, the facets provide an instant overview of the aggregate distribution for each set of values: we can see at a glance, for example, that the majority of sentiment is positive, and that the majority of articles are sourced from Online News. Of course, this insight could also be facilitated by other display formats, but a well-chosen visualization will do this much more effectively than the more traditional display formats discussed above. (As an aside, the choice of pie charts here is somewhat questionable, but the principle of using visualizations to communicate key metrics is the central point.)
Another common use of data visualisation in faceted search is to communicate patterns in geospatial data. An example of this can be found at WITS (the Worldwide Incidents Tracking System), which uses various forms of visualization to display terrorist incidents overlaid on a map of the world:
Visualizations such as this allow users to perceive spatial patterns in record distribution and explore relationships between particular facets (rendered as hyperlinks in the upper panel) and aggregate distributions across the map. However, the interaction model in this implementation is somewhat different to Newssift, in that selecting an item on the map (e.g. a cluster in the rendering above) appears to centre and zoom the map on that region, rather than applying the selected item as a refinement. In this respect, the visualization is not behaving as a facet in the strict sense, but it nonetheless allows the end user to productively explore patterns at the aggregate record level.
Over the last few weeks we’ve looked at some of fundamental design issues in faceted search, such as layout (e.g. where to place the faceted navigation menus) and state (e.g. whether they should be open or closed by default). In this article, we’ve complemented that with a review of the many formats for displaying facets and the key principles for choosing between them. In our next post, we’ll round out this mini-series with a look some of the remaining design fundamentals including display scalability (i.e. techniques for managing long lists of facet values).