Dynamic facet population with Solr DataImportHandler
Join the DZone community and get the full member experience.
Join For FreeHere's what I'm trying to do:
Given this mysql table:
CREATE TABLE `tag` ( `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY, `name` varchar(100) NOT NULL UNIQUE, `category` varchar(100) ); INSERT INTO tag (name,category) VALUES ('good','foo'); INSERT INTO tag (name,category) VALUES ('awe-inspiring','foo'); INSERT INTO tag (name,category) VALUES ('mediocre','bar'); INSERT INTO tag (name,category) VALUES ('terrible','car');
and this solr schema
<field name="tag-foo" type="string" indexed="true" stored="true" multiValued="true"/> <field name="tag-bar" type="string" indexed="true" stored="true" multiValued="true"/> <field name="tag-car" type="string" indexed="true" stored="true" multiValued="true"/>
to populate these tag fields via DataImportHandler.
The dumb (but straightforward) way to do it is to use sub-entities, but this is terribly expensive since you use one extra SQL query per category.
Solution
My general approach was to concatenate the rows into a single row, then use RegexTransformer and a custom dataimport Transformer to split out the values.
Here's how I did it:
My dataimporthandler xml:
<entity name="tag-facets" transformer="RegexTransformer,org.supermind.solr.TagFacetsTransformer" query="select group_concat(concat(t.category,'=',t.name) separator '#') as tagfacets from tag t,booktag bt where bt.id='${book.id}' and t.category is not null"> <field column="tagfacets" splitBy="#"/> </entity>
You'll see that a temporary field tagfacets is used. This will be deleted later on in TagFacetsTransformer.
import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.Transformer; import java.util.List; import java.util.Map; public class TagFacetsTransformer extends Transformer { public Object transformRow(Map<String, Object> row, Context context) { Object tf = row.get("tagfacets"); if (tf != null) { if (tf instanceof List) { List list = (List) tf; for (Object o : list) { String[] arr = ((String) o).split("="); if (arr.length == 2) row.put("tag-" + arr[0], arr[1]); } } else { String[] arr = ((String) tf).split("="); if (arr.length == 2) row.put("tag-" + arr[0], arr[1]); } row.remove("tagfacets"); } return row; } }
Here's the output via DIH's verbose output (with my own data):
<str name="tagfacets">lang=ruby#framework=ruby-on-rails</str> <str>---------------------------------------------</str> <lst name="transformer:RegexTransformer"> <str>---------------------------------------------</str> <arr name="tagfacets"> <str>lang=ruby</str> <str>framework=ruby-on-rails</str> </arr> <str>---------------------------------------------</str> <lst name="transformer:org.supermind.solr.TagFacetsTransformer"> <str>---------------------------------------------</str> <str name="tag-framework">ruby-on-rails</str> <str name="tag-lang">ruby</str> <str>---------------------------------------------</str> </lst> </lst> </lst>
You can see the step-by-step transformation of the input value.
Pretty nifty, eh?
Published at DZone with permission of Kelvin Tan. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
TDD vs. BDD: Choosing The Suitable Framework
-
Explainable AI: Making the Black Box Transparent
-
How To Check IP Addresses for Known Threats and Tor Exit Node Servers in Java
-
Mastering Time Series Analysis: Techniques, Models, and Strategies
Comments