Async Solr Queries in Python
Async Solr Queries in Python
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
I frequently hit the wall of needing to work asynchronously with Solr requests in Python. I’ll have some code that blocks on a Solr HTTP request, waits for it to complete, then execute a second request. Something like this code:
import requests #Search 1 solrResp = requests.get('http://mysolr.com/solr/statedecoded/search?q=law') for doc in solrResp.json()['response']['docs']: print doc['catch_line'] #Search 2 solrResp = requests.get('http://mysolr.com/solr/statedecoded/search?q=shoplifting') for doc in solrResp.json()['response']['docs']: print doc['catch_line']
(we’re using the Requests library to do HTTP):
Being able to parallelize work is especially helpful with scripts that index documents into Solr. I need to scale my work up so that Solr, not network access, is the indexing bottleneck.
Working with gevent is fairly straightforward. One slight sticking point is the
gevent.monkey.patch_all() which patches a lot of the standard library to cooperate better with gevent’s asychrony. It sounds scary, but I have yet to have a problem with the monkey patched implementations.
Without further ado, here’s how you use gevents to do parallel Solr requests:
import requests from gevent import monkey import gevent monkey.patch_all() class Searcher(object): """ Simple wrapper for doing a search and collecting the results """ def __init__(self, searchUrl): self.searchUrl = searchUrl def search(self): solrResp = requests.get(self.searchUrl) self.docs = solrResp.json()['response']['docs'] def searchMultiple(urls): """ Use gevent to execute the passed in urls; dump the results""" searchers = [Searcher(url) for url in urls] # Gather a handle for each task handles =  for searcher in searchers: handles.append(gevent.spawn(searcher.search)) # Block until all work is done gevent.joinall(handles) # Dump the results for searcher in searchers: print "Search Results for %s" % searcher.searchUrl for doc in searcher.docs: print doc['catch_line'] searchUrls = ['http://mysolr.com/solr/statedecoded/search?q=law', 'http://mysolr.com/solr/statedecoded/search?q=shoplifting'] searchMultiple(searchUrls)
# Gather a handle for each task handles =  for searcher in searchers: handles.append(gevent.spawn(searcher.search)) # Block until all work is done gevent.joinall(handles)
We tell gevent to spawn
searcher.search. This gives us a handle to the spawned task. We can then optionally wait for all the spawned tasks to complete, then dump the results.
That’s about it! As always, comment if you have any thoughts on pointers. And let us know how we can help with any part of your Solr search application!
Published at DZone with permission of Doug Turnbull , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.