Over a million developers have joined DZone.

Solr filters: KeepWordFilter

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

This time I decided to look at one of the unusual filters available in the standard distribution of Solr. The first one in my hands is a filter called KeepWordFilter.

Let’s start

First, a few words about what this filter does. As the name might indicate the main purpose of this filter is to “stop” words. More specifically, the filter does the opposite of filter called StopFilter. So how does this filter work ? I’ll talk about this in a moment – let’s start with the definition of the type and fields in the schema.xml file:

<fieldtype name="keepwords" class="solr.TextField">
   <analyzer>
      <code><</code><code>tokenizer</code> <code>class</code><code>=</code><code>"solr.WhitespaceTokenizerFactory"</code><code>/></code>
      <filter class="solr.KeepWordFilterFactory" words="words.txt" ignoreCase="true"/>
   </analyzer>
</fieldtype>

As shown in the above definition in addition to the standard class and name attributes the filter has two additional attributes::

  • words – the list of words to keep
  • ignoreCasetrue | false value indicating case ignore functionality.

File contents

Let’s assume that the words.txt file contain the following words:

ala
ma
kota

If you would like to index the phrase “Ala ma kota, a kot ma Alę” the following tokens will be written into the index: “ala”, “ma”, “kota”, “ma” because only those terms are defined in the words.txt file. This is clearly visible evident in the Solr administration panel:

A few words at the end

Although I never used the filter it seems to me that this is a good filter to use when you need to store the values of  enumerated types, or in situations where we are interested in finite, or even better – a small and known in advance list of values, such as the categories where we can not filter information at the application level, or when it is very difficult.



The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!

Topics:

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}