DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report

Developing Your Own Solr Filter

Rafał Kuć user avatar by
Rafał Kuć
·
May. 18, 12 · Interview
Like (0)
Save
Tweet
Share
10.69K Views

Join the DZone community and get the full member experience.

Join For Free

Sometimes Lucene and Solr out of the box functionality is not enough. When such a time comes, we need to extend what Lucene and Solr gives us and create our own plugin. In today's post I’ll try to show you how to develop a custom filter and use it in Solr.

Assumptions

Lets assume, that we need a filter that would allow us to reverse every word we have in a given field. So, if the input is “solr.pl” the output would be “lp.rlos”. It’s not the hardest example, but for the purpose of this entry it will be enough. One more thing – I decided to omit describing how to setup your IDE, how to compile your code, build jar and stuff like that. We will only focus on the code.

Additional Information

Code, which is presented in this post was created using Solr 3.6 libraries, although you shouldn’t have much problems with compiling it with Solr 4 binaries. Keep in mind though that some slight modifications may be needed (in case something changes before Solr 4.0 release).

What We Need

In order for Solr to be able to use our filter, we need two classes. The first class is the actual filter implementation, which will be responsible for handling the actual logic. The second class is the filter factory, which will be responsible for creating instances of the filter. Lets get it done then.

Filter

In order to implement our filter we will extends the TokenFilter class from the org.apache.lucene.analysis and we will override the incrementToken method. This method returns a boolean value – if a value is still available for processing in the token stream, this method should return true, is the token in the token stream shouldn’t be further analyzed this method should return false. The implementation should look like the one below:

package pl.solr.analysis;

import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public final class ReverseFilter extends TokenFilter {
  private CharTermAttribute charTermAttr;

  protected ReverseFilter(TokenStream ts) {
    super(ts);
    this.charTermAttr = addAttribute(CharTermAttribute.class);
  }

  @Override
  public boolean incrementToken() throws IOException {
    if (!input.incrementToken()) {
      return false;
    }

    int length = charTermAttr.length();
    char[] buffer = charTermAttr.buffer();
    char[] newBuffer = new char[length];
    for (int i = 0; i < length; i++) {
      newBuffer[i] = buffer[length - 1 - i];
    }
    charTermAttr.setEmpty();
    charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
    return true;
  }
}

Description of the Above Implementation

A few words about some of the lines of code in the above implementation:

  • Line 9 – class which extends TokenFilter class and will be used as a filter should be marked as final (Lucene requirement).
  • Line 10 – token stream attribute, which allows us to get and modify the text contents of the term. If we would like, our filter could have used more than a single stream attribute, for example one like attribute for getting and changing position in the token stream or payload one. List of Attribute interface implementation can be found in Lucene API (ie. http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/util/Attribute.html).
  • Lines 12 – 15 - constructor which takes token stream as an argument and then adding (line 14) appropriate token stream attribute.
  • Lines 18 – 30 – incrementToken method implementation.
  • Lines 19 – 21 – check if token is available for processing. If not return false.
  • Line 23 – getting the size of the buffer contents of which we want to reverse.
  • Line 24 – getting the buffer in which we have the word we want to reverse.  Term text in stored as char array and thus the best one, will be to use it and not construct String object.
  • Lines 25 – 28 – create a new buffer and reverse the actual one.
  • Line 29 – clean the original buffer (needed in case of using append methods).
  • Line 30 – copy the changes we made to the buffer of the token stream attribute.
  • Line 31 – return true in order to inform that there is a token available for further processing.

Filter Factory

As I wrote earlier, in order for Solr to be able to use our filter, we need to implement filter factory class. Because, we don’t have any special configuration values and such, factory implementation should be very simple. We will extends BaseTokenFilterFactory class from the org.apache.solr.analysis package. The implementation can look like the following:

package pl.solr.analysis;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenFilterFactory;

public class ReverseFilterFactory extends BaseTokenFilterFactory {
  @Override
  public TokenStream create(TokenStream ts) {
    return new ReverseFilter(ts);
  }
}

As you can see filter factory implementation is simple – we only needed to override a single create method in which we instantiate our filter and return it.

Configuration

After compilation and jar file preparation, we copy the jar to a directory Solr will be able to see it. We can do this by creating the lib directory in the Solr home directory and then adding the following entry to the solrconfig.xml file:

<lib dir="../lib/" regex="*.jar" />

Then we change the schema.xml file and we add a new field type that will use our filter:

<fieldType name="text_reversed" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="pl.solr.analysis.ReverseFilterFactory" />
  </analyzer>
</fieldType>

It is worth to note, that as class attribute value of the filter tag we provide the full package and class names of the factory we created, not the filter itself. It is important to remember that, otherwise Solr will throw errors.

Does it Work ?

In order to show you that it works, I provide the following screen shot of the Solr administration panel:

To Sum Up

As you can see on the above example creating your own filter is not a complicated thing. Of course, the idea of the filter was very simple and thus its implementation was simple too. I hope this post will be helpful when the time comes that you need to create your own filter for Solr.



Filter (software)

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Solving the Kubernetes Security Puzzle
  • mTLS Everywere
  • Stop Using Spring Profiles Per Environment
  • A Gentle Introduction to Kubernetes

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: