Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solr filters: KeepWordFilter

DZone's Guide to

Solr filters: KeepWordFilter

· Java Zone ·
Free Resource

Verify, standardize, and correct the Big 4 + more– name, email, phone and global addresses – try our Data Quality APIs now at Melissa Developer Portal!

This time I decided to look at one of the unusual filters available in the standard distribution of Solr. The first one in my hands is a filter called KeepWordFilter.

Let’s start

First, a few words about what this filter does. As the name might indicate the main purpose of this filter is to “stop” words. More specifically, the filter does the opposite of filter called StopFilter. So how does this filter work ? I’ll talk about this in a moment – let’s start with the definition of the type and fields in the schema.xml file:

<fieldtype name="keepwords" class="solr.TextField">
   <analyzer>
      <code><</code><code>tokenizer</code> <code>class</code><code>=</code><code>"solr.WhitespaceTokenizerFactory"</code><code>/></code>
      <filter class="solr.KeepWordFilterFactory" words="words.txt" ignoreCase="true"/>
   </analyzer>
</fieldtype>

As shown in the above definition in addition to the standard class and name attributes the filter has two additional attributes::

  • words – the list of words to keep
  • ignoreCasetrue | false value indicating case ignore functionality.

File contents

Let’s assume that the words.txt file contain the following words:

ala
ma
kota

If you would like to index the phrase “Ala ma kota, a kot ma Alę” the following tokens will be written into the index: “ala”, “ma”, “kota”, “ma” because only those terms are defined in the words.txt file. This is clearly visible evident in the Solr administration panel:

A few words at the end

Although I never used the filter it seems to me that this is a good filter to use when you need to store the values of  enumerated types, or in situations where we are interested in finite, or even better – a small and known in advance list of values, such as the categories where we can not filter information at the application level, or when it is very difficult.



Developers! Quickly and easily gain access to the tools and information you need! Explore, test and combine our data quality APIs at Melissa Developer Portal – home to tools that save time and boost revenue. Our APIs verify, standardize, and correct the Big 4 + more – name, email, phone and global addresses – to ensure accurate delivery, prevent blacklisting and identify risks in real-time.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}