DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Trending

  • Log Analysis: How to Digest 15 Billion Logs Per Day and Keep Big Queries Within 1 Second
  • CI/CD Tools and Technologies: Unlock the Power of DevOps
  • What Technical Skills Can You Expect To Gain From a DevOps Course Syllabus?
  • The State of Data Streaming for Telco
  1. DZone
  2. Coding
  3. Java
  4. Lucene's SearcherManager simplifies reopen with threads

Lucene's SearcherManager simplifies reopen with threads

Michael Mccandless user avatar by
Michael Mccandless
·
Sep. 28, 11 · News
Like (0)
Save
Tweet
Share
7.85K Views

Join the DZone community and get the full member experience.

Join For Free
Modern computers have wonderful hardware concurrency, within and across CPU cores, RAM and IO resources, which means your typical server-based search application should use multiple threads to fully utilize all resources.

For searching, this usually means you'll have one thread handle each search request, sharing a single IndexSearcher instance. This model is effective: the Lucene developers work hard to minimize internal locking in all Lucene classes. In fact, we recently removed thread contention during indexing (specifically, flushing), resulting in massive gains in indexing throughput on highly concurrent hardware.

Since IndexSearcher exposes a fixed, point-in-time view of the index, when you make changes to the index you'll need to reopen it. Fortunately, since version 2.9, Lucene has provided the IndexReader.reopen method to get a new reader reflecting the changes.

This operation is efficient: the new reader shares already warmed sub-readers in common with the old reader, so it only opens sub-readers for any newly created segments. This means reopen time is generally in proportion to how many changes you made; however, when a large merge had completed it will be longer. It's best to warm the new reader before putting it into production by running a set of "typical" searches for your application, so that Lucene performs one-time initialization for internal data structures (norms, field cache, etc.).

But how should you properly reopen, while search threads are still running and new searches are forever arriving? Your search application is popular, users are always searching and there's never a good time to switch! The core issue is that you must never close your old IndexReader while other threads are still using it for searching, otherwise those threads can easily hit cryptic exceptions that often mimic index corruption.

Lucene tries to detect that you've done this, and will throw a nice AlreadyClosedException, but we cannot guarantee that exception is thrown since we only check up front, when the search kicks off: if you close the reader when a search is already underway then all bets are off.

One simple approach would be to temporarily block all new searches and wait for all running searches to complete, and then close the old reader and switch to the new one. This is how janitors often clean a bathroom: they wait for all current users to finish and block new users with the all-too-familiar plastic yellow sign.

While the bathroom cleaning approach will work, it has an obviously serious drawback: during the cutover you are now forcing your users to wait, and that wait time could be long (the time for the slowest currently running search to finish).

A much better solution is to immediately direct new searches to the new reader, as soon as it's done warming, and then separately wait for the still-running searches against the old reader to complete. Once the very last search has finished with the old reader, close it.

This solution is fully concurrent: it has no locking whatsoever so searches are never blocked, as long as you use a separate thread to perform the reopen and warming. The time to reopen and warm the new reader has no impact on ongoing searches, except to the extent that reopen consumes CPU, RAM and IO resources to do its job (and, sometimes, this can in fact interfere with ongoing searches).

So how exactly do you implement this approach? The simplest way is to use the reference counting APIs already provided by IndexReader to track how many threads are currently using each searcher. Fortunately, as of Lucene 3.5.0, there will be a new contrib/misc utility class, SearcherManager, originally created as an example for Lucene in Action, 2nd edition, that does this for you! (LUCENE-3445 has the details.)

The class is easy to use. You first create it, by providing the Directory holding your index and a SearchWarmer instance:

  class MySearchWarmer implements SearchWarmer {
    @Override
    public void warm(IndexSearcher searcher) throws IOException {
      // Run some diverse searches, searching and sorting against all
      // fields that are used by your application
    }
  }

  Directory dir = FSDirectory.open(new File("/path/to/index"));
  SearcherManager mgr = new SearcherManager(dir,
                                            new MySearchWarmer());

Then, for each search request:

  IndexSearcher searcher = mgr.acquire();
  try {
    // Do your search, including loading any documents, etc.
  } finally {
    mgr.release(searcher);

    // Set to null to ensure we never again try to use
    // this searcher instance after releasing:
    searcher = null;
}

Be sure you fully consume searcher before releasing it! A common mistake is to release it yet later accidentally use it again to load stored documents, for rendering the search results for the current page.

Finally, you'll need to periodically call the maybeReopen method from a separate (ie, non-searching) thread. This method will reopen the reader, and only if there was actually a change will it cutover. If your application knows when changes have been committed to the index, you can reopen right after that. Otherwise, you can simply call maybeReopen every X seconds. When there has been no change to the index, the cost of maybeReopen is negligible, so calling it frequently is fine.

Beware the potentially high transient cost of reopen and warm! During reopen, as you must have two readers open until the old one can be closed, you should budget plenty of RAM in the computer and heap for the JVM, to comfortably handle the worst case when the two readers share no sub-readers (for example, after a full optimize) and thus consume 2X the RAM of a single reader. Otherwise you might hit a swap storm or OutOfMemoryError, effectively taking down entire whole search application. Worse, you won't see this problem early on: your first few hundred reopens could easily use only small amounts of added heap, but then suddenly on some unexpected reopen the cost is far higher. Reopening and warming is also generally IO intensive as the reader must load certain index data structures into memory.

Next time I'll describe another utility class, NRTManager, available since version 3.3.0, that you should use instead if your application uses Lucene's fast-turnaround near-real-time (NRT) search. This class solves the same problem (thread-safety during reopening) as SearcherManager but adds a fun twist as it gives you more specific control over which changes must be visible in the newly opened reader.

Lucene

Published at DZone with permission of Michael Mccandless, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: