Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solr 3.1: FastVectorHighlighting

DZone's Guide to

Solr 3.1: FastVectorHighlighting

· Java Zone
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

One of the many new features that Lucene and Solr 3.1 brings is FastVectorHighlighting - as the change notes say nothing less than the improved functionality of highlighting. Currently the highlighting mechanism is not too fast, sometimes it could kill your Solr instance when dealing with a large amount of data, or very long text fields. I thought that it is worthwhile to test the performance of the new functionality.

A few words at the beginning

First, some information about the possibilities of a new Lucene highlighter:

  • supports N-gram based fields
  • enforces the use of Java 5 or higher
  • takes boosts into consideration in order to boost the importance of the text fragments
  • it is very fast for large documents

It is also worth to notice that the current highlighter is marked as Deprecated according to the SOLR-1696 Jira issue.

How was the test performed ?

For testing purposes I used an index that contains approximately 1.2 million documents (I’ve indexed the Polish Wikipedia – only the latest changes). For each of the following searches I used a one of the biggest fields to highlight on, once with the old (hl.useFastVectorHighlighter = false), once with the new (hl.useFastVectorHighlighter = true) highlighter. Tests were performed on the caches turned off. The table contains the response times which are the average time of 10 queries sequentially excluding the largest and smallest. Solr was restarted after each query. Below are the results of this simple test:

Query Highlighter query time FastVectorHighlighter query time Documents returned
q=jan 3ms 2ms 47690
q=julian+tuwim 20ms 13ms 399
q=poczta+polska 18ms 13ms 4507
q=wojna+armia+krajowa 10ms 8ms 1714


Although the test is very simple, it shows a pattern – FastVectorHighlighter is faster than the current highlighter.

As for the quality of highlighting fragments, I couldn’t see a major differences, although this specific data is not made to such observations.

One thing to remember

Please note that FastVectorHighlighter requires that the field on which it will work to be properly defined. It is necessary to set the on the following attributes: term vectors (termVectors=”true”), term positions (termPositions=”true”) and term offsets (termOffsets=”true”). Otherwise, continue to be used for an old mechanism.

To sum up

Please remember that the performed test was not a detailed performance test of the new highlighting method. The test was just a simulation of environment which can be closely related to some production environments. However after making the test we can say, that we can expect the new highlighting method to be faster than the older one.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}