DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Effective Java Collection Framework: Best Practices and Tips
  • Microservices With Apache Camel and Quarkus

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Effective Java Collection Framework: Best Practices and Tips
  • Microservices With Apache Camel and Quarkus
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Deployment
  4. Enabling SOLR Autocommit with a Custom Haystack Backend

Enabling SOLR Autocommit with a Custom Haystack Backend

Chase Seibert user avatar by
Chase Seibert
·
Jul. 03, 14 · Interview
Like (0)
Save
Tweet
Share
7.63K Views

Join the DZone community and get the full member experience.

Join For Free
 By default Django Haystack makes updates to your Solr index available for searching immediately. It does this in the simplest way possible, it commits every single update individually. That can be quite slow. I have an index with 35 million records, and under heavy write load commits of 1,000 records can slow down and take up to 5 seconds for each chunk. In extreme cases, Solr can refuse to accept that much write load at once, and throw an exception like the following:
<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">503</int>
        <int name="QTime">1492</int>
    </lst>
    <lst name="error">
        <str name="msg">Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</str>
        <int name="code">503</int>
    </lst>
</response>

Investigating this error, I turned up a Stackoverflow post basically saying to not make so many commits. That turned up a Haystack pull request to make manual commits optional.

You can see the basic issue by looking at the logs that Haystack creates each time it issues a write request to the Solr REST API:

Finished 'http://localhost:8080/solr/my_index/update/?commit=true' (post) with body 'u'<add>...' in 0.010 seconds.

As of Solr 4.0, we have much more performant options for bulk indexing. A common setup is to use autocommit (set by default to 15 seconds) and abstain from manually committing by passing commit=false on the REST API URL. Though Haystack supports passing a commit boolean to the various back-end implementations of update, remove and clear, this parameter is never explicitly set. Instead, you can implement your own search back-end subclass to pass this value.

from haystack.backends.solr_backend import SolrEngine, SolrSearchBackend


class AutoCommitSolrSearchBackend(SolrSearchBackend):

    def update(self, index, iterable, commit=False):
        super(AutoCommitSolrSearchBackend, self).update(index, iterable, commit=commit)

    def remove(self, obj_or_string, commit=False):
        super(AutoCommitSolrSearchBackend, self).remove(obj_or_string, commit=commit)

    def clear(self, models=[], commit=False):
        super(AutoCommitSolrSearchBackend, self).clear(models, commit=commit)


class AutoCommitSolrEngine(SolrEngine):
    ''' the built-in Solr engine in Haystack performs a manual commit after each update/add/remove/clear. This
    is really slow. Solr is configured by default to auto-commit changes every 15 seconds, so there is no need to
    commit manually on every update.
    '''
    backend = AutoCommitSolrSearchBackend

Then you can use this new AutoCommitSolrEngine in your HAYSTACK_CONNECTIONS setting.

HAYSTACK_CONNECTIONS = {
     'default': {
         'ENGINE': 'myapp.serach.AutoCommitSolrEngine',
         'URL': 'http://localhost:8080/solr/my_index',
     }
}

Note: By default, indexed items will not show up in searches right away. That’s what soft-commit is for.

Hard commits are about durability, soft commits are about visibility. Understanding Transaction Logs, Soft Commit and Commit in SolrCloud - Erick Erickson

To make your auto-committed items available to search in a timely fashion, you must set a autoSoftCommit.maxTime in your Solr config. This is NOT set by default.

    <!-- softAutoCommit is like autoCommit except it causes a
         'soft' commit which only ensures that changes are visible
         but does not ensure that data is synced to disk.  This is
         faster and more near-realtime friendly than a hard commit.
      -->
    <autoSoftCommit>
      <maxTime>1000</maxTime>
    </autoSoftCommit>

Alternately, you can set autoCommit.openSearcher to true, which will cause a new searcher worker to be instantiated every time you auto-commit. This could slow down the first searches that come in after an auto commit, however.

Haystack (MIT project) Commit (data management) Autocommit

Published at DZone with permission of Chase Seibert, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Effective Java Collection Framework: Best Practices and Tips
  • Microservices With Apache Camel and Quarkus

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: