Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Dynamic facet population with Solr DataImportHandler

DZone's Guide to

Dynamic facet population with Solr DataImportHandler

· Java Zone ·
Free Resource

Verify, standardize, and correct the Big 4 + more – name, email, phone and global addresses – try our Data Quality APIs now at Melissa Developer Portal!

Here's what I'm trying to do:

Given this mysql table:

CREATE TABLE `tag` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `name` varchar(100) NOT NULL UNIQUE,
    `category` varchar(100)
);
INSERT INTO tag (name,category) VALUES ('good','foo');
INSERT INTO tag (name,category) VALUES ('awe-inspiring','foo');
INSERT INTO tag (name,category) VALUES ('mediocre','bar');
INSERT INTO tag (name,category) VALUES ('terrible','car');

and this solr schema

<field name="tag-foo" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-bar" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-car" type="string" indexed="true" stored="true" multiValued="true"/>

to populate these tag fields via DataImportHandler.

The dumb (but straightforward) way to do it is to use sub-entities, but this is terribly expensive since you use one extra SQL query per category.

Solution

My general approach was to concatenate the rows into a single row, then use RegexTransformer and a custom dataimport Transformer to split out the values.

Here's how I did it:

My dataimporthandler xml:

            <entity name="tag-facets" transformer="RegexTransformer,org.supermind.solr.TagFacetsTransformer"
                    query="select group_concat(concat(t.category,'=',t.name) separator '#') as tagfacets from tag t,booktag bt where bt.id='${book.id}' and t.category is not null">
                <field column="tagfacets" splitBy="#"/>
            </entity>

You'll see that a temporary field tagfacets is used. This will be deleted later on in TagFacetsTransformer.

package org. supermind. solr;
import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.Transformer;

import java.util.List;
import java.util.Map;

public class TagFacetsTransformer extends Transformer {
  public Object transformRow(Map<String, Object> row, Context context) {
    Object tf = row.get("tagfacets");
    if (tf != null) {
      if (tf instanceof List) {
        List list = (List) tf;
        for (Object o : list) {
          String[] arr = ((String) o).split("=");
          if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
        }
      } else {
        String[] arr = ((String) tf).split("=");
        if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
      }
      row.remove("tagfacets");
    }
    return row;
  }
}

Here's the output via DIH's verbose output (with my own data):

<str name="tagfacets">lang=ruby#framework=ruby-on-rails</str>
<str>---------------------------------------------</str>
<lst name="transformer:RegexTransformer">
<str>---------------------------------------------</str>
<arr name="tagfacets">
<str>lang=ruby</str>
<str>framework=ruby-on-rails</str>
</arr>
<str>---------------------------------------------</str>
<lst name="transformer:org.supermind.solr.TagFacetsTransformer">
<str>---------------------------------------------</str>
<str name="tag-framework">ruby-on-rails</str>
<str name="tag-lang">ruby</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>

You can see the step-by-step transformation of the input value.

Pretty nifty, eh?

Developers! Quickly and easily gain access to the tools and information you need! Explore, test and combine our data quality APIs at Melissa Developer Portal – home to tools that save time and boost revenue. Our APIs verify, standardize, and correct the Big 4 + more – name, email, phone and global addresses – to ensure accurate delivery, prevent blacklisting and identify risks in real-time.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}