Enabling SOLR Autocommit with a Custom Haystack Backend
Join the DZone community and get the full member experience.
Join For Free<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">503</int> <int name="QTime">1492</int> </lst> <lst name="error"> <str name="msg">Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</str> <int name="code">503</int> </lst> </response>
Investigating this error, I turned up a Stackoverflow post basically saying to not make so many commits. That turned up a Haystack pull request to make manual commits optional.
You can see the basic issue by looking at the logs that Haystack creates each time it issues a write request to the Solr REST API:
Finished 'http://localhost:8080/solr/my_index/update/?commit=true' (post) with body 'u'<add>...' in 0.010 seconds.
As of Solr 4.0, we have much more performant options for bulk indexing. A common setup
is to use autocommit
(set by default to 15 seconds) and abstain from manually committing by passing commit=false
on
the REST API URL. Though Haystack supports passing a commit boolean to the various back-end implementations of update
,
remove
and clear
, this parameter is never explicitly set. Instead, you can implement your own
search back-end subclass to pass this value.
from haystack.backends.solr_backend import SolrEngine, SolrSearchBackend class AutoCommitSolrSearchBackend(SolrSearchBackend): def update(self, index, iterable, commit=False): super(AutoCommitSolrSearchBackend, self).update(index, iterable, commit=commit) def remove(self, obj_or_string, commit=False): super(AutoCommitSolrSearchBackend, self).remove(obj_or_string, commit=commit) def clear(self, models=[], commit=False): super(AutoCommitSolrSearchBackend, self).clear(models, commit=commit) class AutoCommitSolrEngine(SolrEngine): ''' the built-in Solr engine in Haystack performs a manual commit after each update/add/remove/clear. This is really slow. Solr is configured by default to auto-commit changes every 15 seconds, so there is no need to commit manually on every update. ''' backend = AutoCommitSolrSearchBackend
Then you can use this new AutoCommitSolrEngine
in your HAYSTACK_CONNECTIONS
setting.
HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'myapp.serach.AutoCommitSolrEngine', 'URL': 'http://localhost:8080/solr/my_index', } }
Note: By default, indexed items will not show up in searches right away. That’s what soft-commit is for.
Hard commits are about durability, soft commits are about visibility. Understanding Transaction Logs, Soft Commit and Commit in SolrCloud - Erick Erickson
To make your auto-committed items available to search in a timely fashion, you must set a autoSoftCommit.maxTime
in your Solr config. This is NOT set by default.
<!-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
Alternately, you can set autoCommit.openSearcher
to true
, which will cause a new searcher worker to be instantiated
every time you auto-commit. This could slow down the first searches that come in after an auto commit, however.
Published at DZone with permission of Chase Seibert, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
How to Submit a Post to DZone
-
DZone's Article Submission Guidelines
-
Effective Java Collection Framework: Best Practices and Tips
-
Microservices With Apache Camel and Quarkus
Comments