Over a million developers have joined DZone.

Updating a Solr Analysis Plugin from 1.4.1 (Lucene 2.9) to Solr / Lucene 4.0 (current trunk)

DZone's Guide to

Updating a Solr Analysis Plugin from 1.4.1 (Lucene 2.9) to Solr / Lucene 4.0 (current trunk)

· Java Zone
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

Three years and a couple of weeks ago I wrote a post about how to get started writing a simple Solr Analysis Plugin to handle incoming tokens and modifying them in place when an update is requested.

Since then the whole version number structure of Solr has changed (and is now in sync with the underlying Lucene version), and not surprisingly, the current API has also been updated. This means that a few small changes are required to get your analysis plugins running on the current trunk of Lucene and Solr.

The main change is that the previously named TermAttribute is now named CharTermAttribute, this means that any imports will have to change:

    - import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    + import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

Any declarations of TermAttributes will need to be CharTermAttributes instead:

    - private TermAttribute termAtt;
    + private CharTermAttribute termAtt;

public NorwegianNameFilter(TokenStream input)
-     termAtt = (TermAttribute) addAttribute(TermAttribute.class);
+     termAtt = input.getAttribute(CharTermAttribute.class);

We now fetch the attribute from the current TokenStream (not sure if the old way I did it has been deprecated, but this seems to be the suggested way now). We also change any references to TermAttribute.class to CharTermAttribute.class.

The actual TermAttribute interface has also changed, meaning we’ll have to change a few of the old method calls:

    - termAtt.setTermLength(this.parseBuffer(termAtt.termBuffer(), termAtt.termLength()));
    + termAtt.setLength(this.parseBuffer(termAtt.buffer(), termAtt.length()));

.setTermLength() => .setLength()
.termBuffer => .buffer()
.termLength => .length()

The methods will behave in the same manner as in the previous API, .buffer() will retrieve a char array (char[]) which is the current buffer of the actual term which can you modify in place, while length() and setLength() retrieves the current length of the buffer (the buffer can be larger than the part used) and sets the new length of the buffer (if you’re collapsing characters).

The new implementation of our analysis filter skeleton:

    package no.derdubor.solr.analysis;
    import java.io.IOException;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenFilter;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    public class NorwegianNameFilter extends TokenFilter
        private CharTermAttribute termAtt;
        public NorwegianNameFilter(TokenStream input)
            termAtt = input.getAttribute(CharTermAttribute.class);
        public boolean incrementToken() throws IOException
            if (this.input.incrementToken())
                termAtt.setLength(this.parseBuffer(termAtt.buffer(), termAtt.length()));
                return true;
            return false;
        protected int parseBuffer(char[] buffer, int bufferLength)

From http://e-mats.org/2011/07/updating-a-solr-analysis-plugin-from-1-4-1-lucene-2-9-to-solr-lucene-4-0-current-trunk/

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat


Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}