Over a million developers have joined DZone.

Writing a Solr Analysis Filter Plugin

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

As we’ve been working on getting a better result out of the phonetic search we’re currently doing at derdubor, I started writing a plugin for Solr to be able to return better search results when searching for norwegian names. We’ve been using the standard phonetic filter from Solr 1.2 so far, using the double metaphone encoder for encoding a regular token as a phonetic value. The trouble with this is that a double metaphone value is four simple letters, which means that searchwords such as ‘trafikkontroll’ would get the same meaning as ‘Dyrvik’. The latter being a name and the first being a regular search string which would be better served through an article view. TRAFIKKONTROLL resolves to TRFK in double metaphone, while DYRVIK resolves to DRVK. T and D is considered similiar, as is V and F, and voilá, you’ve got yourself a match in the search result, but not a visual one (or a semantic one, as the words have very different meanings).

To solve this, I decided to write a custom filter plugin which we could tune to names that are in use in Norway. I’ll post about the logic behind my reasoning in regards to wording later and hopefully post the complete filter function we’re applying, but I’ll leave that for another post.

First you need a factory that’s able to produce filters when Solr asks for them:


    package no.derdubor.solr.analysis;
    import java.util.Map;
    import org.apache.solr.analysis.BaseTokenFilterFactory;
    import org.apache.lucene.analysis.TokenStream;
    public class NorwegianNameFilterFactory extends BaseTokenFilterFactory
        Map<String,String> args;
        public Map<String,String> getArgs()
            return args;
        public void init(Map<String,String> args)
            this.args = args;
        public NorwegianNameFilter create(TokenStream input)
            return new NorwegianNameFilter(input);

To compile this example yourself, put the file in no/derdubor/solr/analysis/ (which matches no.derdubor.solr.analysis; in the package statement), and run

javac -6 no/derdubor/solr/analysis/NorwegianNameFilterFactory.java

(you’ll need apache-solr-core.jar and lucene-core.jar in your classpath to do this)

to compile it. You’ll of course also need the filter itself (which is returned from the create-method above):

    package no.derdubor.solr.analysis;
    import java.io.IOException;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenFilter;
    import org.apache.lucene.analysis.TokenStream;
    public class NorwegianNameFilter extends TokenFilter
        public NorwegianNameFilter(TokenStream input)
        public Token next() throws IOException
            return parseToken(this.input.next());
        public Token next(Token result) throws IOException
            return parseToken(this.input.next());
        protected Token parseToken(Token in)
            /* do magic stuff with in.termBuffer() here (a char[] which can be manipulated) */
            /* set the changed length of the new term with in.setTermLength(); before returning it */
            return in;

You should now be able to compile both files:

javac -6 no/derdubor/solr/analysis/*.java

After compiling the plugin, create a jar file which contain your plugin. This will be the “distributable” version of your plugin, and should contain the .class-files of your application.

jar cvf derdubor-solr-norwegiannamefilter.jar no/derdubor/solr/analysis/*.class

Move the file you just created (derdubor-solr-norwegiannamefilter.jar in the example above) into your Solr home directory. This is where you keep your bin/ and conf/ directory (which contains schema.xml, etc). Create a lib directory in the solr home directory. This is where your custom libraries will live, so copy the file into this directory (lib/).

Restart Solr and check that everything still works as it should. If everything still seems normal, it’s time to enable your filter. In one of your <filter>-chains, you can simply append a <filter> element to insert your own filter into the chain:

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="no.derdubor.solr.analysis.NorwegianNameFilterFactory" />

Restart Solr again, and if everything still works as it should, you’re all set! Time to index some new data (remember that you’ll need to reindex the data for things to work as you expect, since no stored data is processed when you edit your configuration files) and commit it! Do a few searches through the admin interface to see that everything works as it should. I’ve used the “debug” option to .. well, debug .. my plugin while developing it. A very neat trick is to see what terms your filter expands to (if you set type=”query” in the analyzer section, it will be applied to all queries against that field), which will be shown in the first debug section when looking at the result (you’ll have to scroll down to the end to see this). If you need to debug things to a greater extend, you can attach a debugger or simply use the Good Old Proven Way of println! (these will end up in catalina.out in logs/ in your tomcat directory). Good luck!

Potential Problems and How To Solve Them

  • If you get an error about incompatible class versions, check that you’re actually running the same (or newer) version of the JVM (java -version) on your Solr search server that you use on your own development machine (use -5 to force 1.5 compatible class files instead of 1.6 when compiling).
  • If you get an error about missing config or something similiar, or that Solr is unable to find the method it’s searching for (generally triggered by an ReflectionException), remember to define your classes public! public class NorwegianNameFilter is your friend! It took at least half an hour until I realized what this simple issue was…

Any comments and followups are of course welcome!

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!


Published at DZone with permission of Mats Lindh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}