Over a million developers have joined DZone.

Solr filters: KeepWordFilter

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

This time I decided to look at one of the unusual filters available in the standard distribution of Solr. The first one in my hands is a filter called KeepWordFilter.

Let’s start

First, a few words about what this filter does. As the name might indicate the main purpose of this filter is to “stop” words. More specifically, the filter does the opposite of filter called StopFilter. So how does this filter work ? I’ll talk about this in a moment – let’s start with the definition of the type and fields in the schema.xml file:

<fieldtype name="keepwords" class="solr.TextField">
   <analyzer>
      <code><</code><code>tokenizer</code> <code>class</code><code>=</code><code>"solr.WhitespaceTokenizerFactory"</code><code>/></code>
      <filter class="solr.KeepWordFilterFactory" words="words.txt" ignoreCase="true"/>
   </analyzer>
</fieldtype>

As shown in the above definition in addition to the standard class and name attributes the filter has two additional attributes::

  • words – the list of words to keep
  • ignoreCasetrue | false value indicating case ignore functionality.

File contents

Let’s assume that the words.txt file contain the following words:

ala
ma
kota

If you would like to index the phrase “Ala ma kota, a kot ma Alę” the following tokens will be written into the index: “ala”, “ma”, “kota”, “ma” because only those terms are defined in the words.txt file. This is clearly visible evident in the Solr administration panel:

A few words at the end

Although I never used the filter it seems to me that this is a good filter to use when you need to store the values of  enumerated types, or in situations where we are interested in finite, or even better – a small and known in advance list of values, such as the categories where we can not filter information at the application level, or when it is very difficult.



The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.

Topics:

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}