Over a million developers have joined DZone.

Getting Real-Time Field Values in Lucene

· DevOps Zone

The DevOps Zone is brought to you in partnership with Sonatype Nexus. The Nexus Suite helps scale your DevOps delivery with continuous component intelligence integrated into development tools, including Eclipse, IntelliJ, Jenkins, Bamboo, SonarQube and more. Schedule a demo today

We know Lucene's near-real-time search is very fast: you can easily refresh your searcher once per second, even at high indexing rates, so that any change to the index is available for searching or faceting at most one second later. For most applications this is plenty fast. 

But what if you sometimes need even better than near-real-time? What if you need to look up truly live or real-time values, so for any document id you can retrieve the very last value indexed? 

Just use the newly committedLiveFieldValues class! 

It's simple to use: when you instantiate it you provide it with your SearcherManager orNRTManager, so that it can subscribe to the RefreshListener to be notified when new searchers are opened, and then whenever you add, update or delete a document, you notify theLiveFieldValues instance. Finally, call the get method to get the last indexed value for a given document id. 

This class is simple inside: it holds the values of recently indexed documents in aConcurrentHashMap, keyed by the document id, to hold documents that were just indexed but not yet available through the near-real-time searcher. Whenever a new near-real-time searcher is successfully opened, it clears the map of all entries that are now included in that searcher. It carefully handles the transition time from when the reopen started to when it finished by checking two maps for the possible value, and failing that, it falls back to the current searcher. 

LiveFieldValues is abstract: you must subclass it and implement the lookupFromSearchermethod to retrieve a document's value from an IndexSearcher, since how your application stores the values in the searcher is application dependent (stored fields, doc values or even postings, payloads or term vectors). 

Note that this class only offers "live get", i.e. you can get the last indexed value for any document, but it does not offer "live search", i.e. you cannot search against the value until the searcher is reopened. Also, the internal maps are only pruned after a new searcher is opened, so RAM usage will grow unbounded if you never reopen! It's up to your application to ensure that the same document id is never updated simultaneously (in different threads) because in that case you cannot know which update "won" (Lucene does not expose this information, althoughLUCENE-3424 is one possible solution for this). 

An example use-case is to store a version field per document so that you know the last version indexed for a given id; you can then use this to reject a later but out-of-order update for that same document whose version is older than the version already indexed. 

LiveFieldValues will be available in the next Lucene release (4.2).

The DevOps Zone is brought to you in partnership with Sonatype Nexus. Use the Nexus Suite to automate your software supply chain and ensure you're using the highest quality open source components at every step of the development lifecycle. Get Nexus today


Published at DZone with permission of Michael Mccandless, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}