Over a million developers have joined DZone.
Platinum Partner

Dynamic facet population with Solr DataImportHandler

· Java Zone

Learn more about Kotlin, a new programming language designed to solve problems that software developers face every day brought to you in partnership with JetBrains.

Here's what I'm trying to do:

Given this mysql table:

CREATE TABLE `tag` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `name` varchar(100) NOT NULL UNIQUE,
    `category` varchar(100)
);
INSERT INTO tag (name,category) VALUES ('good','foo');
INSERT INTO tag (name,category) VALUES ('awe-inspiring','foo');
INSERT INTO tag (name,category) VALUES ('mediocre','bar');
INSERT INTO tag (name,category) VALUES ('terrible','car');

and this solr schema

<field name="tag-foo" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-bar" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-car" type="string" indexed="true" stored="true" multiValued="true"/>

to populate these tag fields via DataImportHandler.

The dumb (but straightforward) way to do it is to use sub-entities, but this is terribly expensive since you use one extra SQL query per category.

Solution

My general approach was to concatenate the rows into a single row, then use RegexTransformer and a custom dataimport Transformer to split out the values.

Here's how I did it:

My dataimporthandler xml:

            <entity name="tag-facets" transformer="RegexTransformer,org.supermind.solr.TagFacetsTransformer"
                    query="select group_concat(concat(t.category,'=',t.name) separator '#') as tagfacets from tag t,booktag bt where bt.id='${book.id}' and t.category is not null">
                <field column="tagfacets" splitBy="#"/>
            </entity>

You'll see that a temporary field tagfacets is used. This will be deleted later on in TagFacetsTransformer.

package org.supermind.solr;
import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.Transformer;

import java.util.List;
import java.util.Map;

public class TagFacetsTransformer extends Transformer {
  public Object transformRow(Map<String, Object> row, Context context) {
    Object tf = row.get("tagfacets");
    if (tf != null) {
      if (tf instanceof List) {
        List list = (List) tf;
        for (Object o : list) {
          String[] arr = ((String) o).split("=");
          if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
        }
      } else {
        String[] arr = ((String) tf).split("=");
        if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
      }
      row.remove("tagfacets");
    }
    return row;
  }
}

Here's the output via DIH's verbose output (with my own data):

<str name="tagfacets">lang=ruby#framework=ruby-on-rails</str>
<str>---------------------------------------------</str>
<lst name="transformer:RegexTransformer">
<str>---------------------------------------------</str>
<arr name="tagfacets">
<str>lang=ruby</str>
<str>framework=ruby-on-rails</str>
</arr>
<str>---------------------------------------------</str>
<lst name="transformer:org.supermind.solr.TagFacetsTransformer">
<str>---------------------------------------------</str>
<str name="tag-framework">ruby-on-rails</str>
<str name="tag-lang">ruby</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>

You can see the step-by-step transformation of the input value.

Pretty nifty, eh?

The Java Zone is brought to you in partnership with JetBrains.  Discover how powerful static code analysis and ergonomic design make development not only productive but also an enjoyable experience.

Topics:

Published at DZone with permission of Kelvin Tan .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}