Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Dynamic facet population with Solr DataImportHandler

DZone's Guide to

Dynamic facet population with Solr DataImportHandler

· Java Zone ·
Free Resource

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

Here's what I'm trying to do:

Given this mysql table:

CREATE TABLE `tag` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `name` varchar(100) NOT NULL UNIQUE,
    `category` varchar(100)
);
INSERT INTO tag (name,category) VALUES ('good','foo');
INSERT INTO tag (name,category) VALUES ('awe-inspiring','foo');
INSERT INTO tag (name,category) VALUES ('mediocre','bar');
INSERT INTO tag (name,category) VALUES ('terrible','car');

and this solr schema

<field name="tag-foo" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-bar" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tag-car" type="string" indexed="true" stored="true" multiValued="true"/>

to populate these tag fields via DataImportHandler.

The dumb (but straightforward) way to do it is to use sub-entities, but this is terribly expensive since you use one extra SQL query per category.

Solution

My general approach was to concatenate the rows into a single row, then use RegexTransformer and a custom dataimport Transformer to split out the values.

Here's how I did it:

My dataimporthandler xml:

            <entity name="tag-facets" transformer="RegexTransformer,org.supermind.solr.TagFacetsTransformer"
                    query="select group_concat(concat(t.category,'=',t.name) separator '#') as tagfacets from tag t,booktag bt where bt.id='${book.id}' and t.category is not null">
                <field column="tagfacets" splitBy="#"/>
            </entity>

You'll see that a temporary field tagfacets is used. This will be deleted later on in TagFacetsTransformer.

package org. supermind. solr;
import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.Transformer;

import java.util.List;
import java.util.Map;

public class TagFacetsTransformer extends Transformer {
  public Object transformRow(Map<String, Object> row, Context context) {
    Object tf = row.get("tagfacets");
    if (tf != null) {
      if (tf instanceof List) {
        List list = (List) tf;
        for (Object o : list) {
          String[] arr = ((String) o).split("=");
          if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
        }
      } else {
        String[] arr = ((String) tf).split("=");
        if (arr.length == 2) row.put("tag-" + arr[0], arr[1]);
      }
      row.remove("tagfacets");
    }
    return row;
  }
}

Here's the output via DIH's verbose output (with my own data):

<str name="tagfacets">lang=ruby#framework=ruby-on-rails</str>
<str>---------------------------------------------</str>
<lst name="transformer:RegexTransformer">
<str>---------------------------------------------</str>
<arr name="tagfacets">
<str>lang=ruby</str>
<str>framework=ruby-on-rails</str>
</arr>
<str>---------------------------------------------</str>
<lst name="transformer:org.supermind.solr.TagFacetsTransformer">
<str>---------------------------------------------</str>
<str name="tag-framework">ruby-on-rails</str>
<str name="tag-lang">ruby</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>

You can see the step-by-step transformation of the input value.

Pretty nifty, eh?

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}