DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Database Integration Tests With Spring Boot and Testcontainers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • How To Manage Vulnerabilities in Modern Cloud-Native Applications
  • What to Pay Attention to as Automation Upends the Developer Experience

Trending

  • Database Integration Tests With Spring Boot and Testcontainers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • How To Manage Vulnerabilities in Modern Cloud-Native Applications
  • What to Pay Attention to as Automation Upends the Developer Experience

Writing a Solr Analysis Filter Plugin

Mats Lindh user avatar by
Mats Lindh
·
Jun. 30, 11 · News
Like (0)
Save
Tweet
Share
17.02K Views

Join the DZone community and get the full member experience.

Join For Free

As we’ve been working on getting a better result out of the phonetic search we’re currently doing at derdubor, I started writing a plugin for Solr to be able to return better search results when searching for norwegian names. We’ve been using the standard phonetic filter from Solr 1.2 so far, using the double metaphone encoder for encoding a regular token as a phonetic value. The trouble with this is that a double metaphone value is four simple letters, which means that searchwords such as ‘trafikkontroll’ would get the same meaning as ‘Dyrvik’. The latter being a name and the first being a regular search string which would be better served through an article view. TRAFIKKONTROLL resolves to TRFK in double metaphone, while DYRVIK resolves to DRVK. T and D is considered similiar, as is V and F, and voilá, you’ve got yourself a match in the search result, but not a visual one (or a semantic one, as the words have very different meanings).

To solve this, I decided to write a custom filter plugin which we could tune to names that are in use in Norway. I’ll post about the logic behind my reasoning in regards to wording later and hopefully post the complete filter function we’re applying, but I’ll leave that for another post.

First you need a factory that’s able to produce filters when Solr asks for them:

NorwegianNameFilterFactory.java:

    package no.derdubor.solr.analysis;
     
    import java.util.Map;
     
    import org.apache.solr.analysis.BaseTokenFilterFactory;
    import org.apache.lucene.analysis.TokenStream;
     
    public class NorwegianNameFilterFactory extends BaseTokenFilterFactory
    {
        Map<String,String> args;
     
        public Map<String,String> getArgs()
        {
            return args;
        }
     
        public void init(Map<String,String> args)
        {
            this.args = args;
        }
     
        public NorwegianNameFilter create(TokenStream input)
        {
            return new NorwegianNameFilter(input);
        }
    }

To compile this example yourself, put the file in no/derdubor/solr/analysis/ (which matches no.derdubor.solr.analysis; in the package statement), and run

javac -6 no/derdubor/solr/analysis/NorwegianNameFilterFactory.java

(you’ll need apache-solr-core.jar and lucene-core.jar in your classpath to do this)

to compile it. You’ll of course also need the filter itself (which is returned from the create-method above):

    package no.derdubor.solr.analysis;
     
    import java.io.IOException;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenFilter;
    import org.apache.lucene.analysis.TokenStream;
     
    public class NorwegianNameFilter extends TokenFilter
    {
        public NorwegianNameFilter(TokenStream input)
        {
            super(input);
        }
     
        public Token next() throws IOException
        {
            return parseToken(this.input.next());
        }
     
        public Token next(Token result) throws IOException
        {
            return parseToken(this.input.next());
        }
     
        protected Token parseToken(Token in)
        {
            /* do magic stuff with in.termBuffer() here (a char[] which can be manipulated) */
            /* set the changed length of the new term with in.setTermLength(); before returning it */
            return in;
        }
    }

You should now be able to compile both files:

javac -6 no/derdubor/solr/analysis/*.java

After compiling the plugin, create a jar file which contain your plugin. This will be the “distributable” version of your plugin, and should contain the .class-files of your application.

jar cvf derdubor-solr-norwegiannamefilter.jar no/derdubor/solr/analysis/*.class

Move the file you just created (derdubor-solr-norwegiannamefilter.jar in the example above) into your Solr home directory. This is where you keep your bin/ and conf/ directory (which contains schema.xml, etc). Create a lib directory in the solr home directory. This is where your custom libraries will live, so copy the file into this directory (lib/).

Restart Solr and check that everything still works as it should. If everything still seems normal, it’s time to enable your filter. In one of your <filter>-chains, you can simply append a <filter> element to insert your own filter into the chain:

    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="no.derdubor.solr.analysis.NorwegianNameFilterFactory" />
    </analyzer>

Restart Solr again, and if everything still works as it should, you’re all set! Time to index some new data (remember that you’ll need to reindex the data for things to work as you expect, since no stored data is processed when you edit your configuration files) and commit it! Do a few searches through the admin interface to see that everything works as it should. I’ve used the “debug” option to .. well, debug .. my plugin while developing it. A very neat trick is to see what terms your filter expands to (if you set type=”query” in the analyzer section, it will be applied to all queries against that field), which will be shown in the first debug section when looking at the result (you’ll have to scroll down to the end to see this). If you need to debug things to a greater extend, you can attach a debugger or simply use the Good Old Proven Way of println! (these will end up in catalina.out in logs/ in your tomcat directory). Good luck!

Potential Problems and How To Solve Them

  • If you get an error about incompatible class versions, check that you’re actually running the same (or newer) version of the JVM (java -version) on your Solr search server that you use on your own development machine (use -5 to force 1.5 compatible class files instead of 1.6 when compiling).
  • If you get an error about missing config or something similiar, or that Solr is unable to find the method it’s searching for (generally triggered by an ReflectionException), remember to define your classes public! public class NorwegianNameFilter is your friend! It took at least half an hour until I realized what this simple issue was…

Any comments and followups are of course welcome!

Filter (software)

Published at DZone with permission of Mats Lindh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Database Integration Tests With Spring Boot and Testcontainers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • How To Manage Vulnerabilities in Modern Cloud-Native Applications
  • What to Pay Attention to as Automation Upends the Developer Experience

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: