Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Sorting with Lucene

DZone's Guide to

Sorting with Lucene

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

I talked about the Lucene formula and how it calculate things using tf-idf for best matches. Now I want to talk about the actual sorting implementation. As it turned out, the default sorting (by relevancy) is really simple. All you need is to get the relevant score for a query, then you shove the results through a heap with a specified size. The heap will take care of maintain the top results.

So far, that is pretty simple to understand. But the question is, how do you do sorting on a field value? The answer is, not easily.

image

GetStringIndex() does something very interesting. I returns  a string index, which gives us:

  • A string array containing all the distinct (sorted) value for this index.
  • A int array with all the documents in the index, with the position of the value of that field in the string value array

Now we can compare fields by their field position on the field, which give us pretty good sorting. Unfortunately, this also require us to load all the values to memory. Let us see another example, which would probably be easier to follow:

Sorting by an integer is done like this:

image

Get an array (whose size match the number of documents),  We can then sort things easily because accessing the relevant field value only require us to have the document id to index into the array.

The reason Lucene does this is that it uses an inverted index, and it has no easy way of going from the field values to the list documents it has. So it is easier to read all the values into memory and work with them there. I don’t like it, but off hand, I can’t think of a better way to handle this.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}