Over a million developers have joined DZone.
Platinum Partner

Dynamic facet population with Solr DataImportHandler

· Java Zone

The Java Zone is brought to you in partnership with AppDynamics. Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems.

Here's what I'm trying to do:

Given this mysql table:

CREATE TABLE `tag` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `name` varchar(100) NOT NULL UNIQUE,
    `category` varchar(100)
);
INSERT INTO tag (name,category) VALUES ('good','foo');
INSERT INTO tag (name,category) VALUES ('awe-inspiring','foo');
INSERT INTO tag (name,category) VALUES ('mediocre','bar');
INSERT INTO tag (name,category) VALUES ('terrible','car');

and this solr schema

<field name="tag-foo" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-bar" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-car" type="string" indexed="true" stored="true" multiValued="true"/>

to populate these tag fields via DataImportHandler.

The dumb (but straightforward) way to do it is to use sub-entities, but this is terribly expensive since you use one extra SQL query per category.

Solution

My general approach was to concatenate the rows into a single row, then use RegexTransformer and a custom dataimport Transformer to split out the values.

Here's how I did it:

My dataimporthandler xml:

            <entity name="tag-facets" transformer="RegexTransformer,org.supermind.solr.TagFacetsTransformer"
                    query="select group_concat(concat(t.category,'=',t.name) separator '#') as tagfacets from tag t,booktag bt where bt.id='${book.id}' and t.category is not null">
                <field column="tagfacets" splitBy="#"/>
            </entity>

You'll see that a temporary field tagfacets is used. This will be deleted later on in TagFacetsTransformer.

package org.supermind.solr;
import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.Transformer;

import java.util.List;
import java.util.Map;

public class TagFacetsTransformer extends Transformer {
  public Object transformRow(Map<String, Object> row, Context context) {
    Object tf = row.get("tagfacets");
    if (tf != null) {
      if (tf instanceof List) {
        List list = (List) tf;
        for (Object o : list) {
          String[] arr = ((String) o).split("=");
          if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
        }
      } else {
        String[] arr = ((String) tf).split("=");
        if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
      }
      row.remove("tagfacets");
    }
    return row;
  }
}

Here's the output via DIH's verbose output (with my own data):

<str name="tagfacets">lang=ruby#framework=ruby-on-rails</str>
<str>---------------------------------------------</str>
<lst name="transformer:RegexTransformer">
<str>---------------------------------------------</str>
<arr name="tagfacets">
<str>lang=ruby</str>
<str>framework=ruby-on-rails</str>
</arr>
<str>---------------------------------------------</str>
<lst name="transformer:org.supermind.solr.TagFacetsTransformer">
<str>---------------------------------------------</str>
<str name="tag-framework">ruby-on-rails</str>
<str name="tag-lang">ruby</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>

You can see the step-by-step transformation of the input value.

Pretty nifty, eh?

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.

Topics:

Published at DZone with permission of Kelvin Tan .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}