The Solr Conversion at CareerBuilder.com: Lower Costs, Greater Agility
Trey's search experience includes handling multi-lingual content across dozens of markets/languages, genetic algorithm and user group based relevancy tuning, geo-spatial search and validation, and work on customized payload scoring models, data mining, clustering, and recommendations. He is responsible for architecting CareerBuilder's cloud-like search API exposing search as a simple, dynamic, and powerful generic service abstracted away from a large, globally-distributed architecture. Trey is also the founder and Chief Architect of Celiaccess.com, a gluten-free search engine and networking site.
DZone: Jobs are one of the most important things we search for on the web. What are some of the major challenges for search technology on a jobs site?
Trey Grainger: Our search technology obviously plays a central role at CareerBuilder, with our primary responsibility being to match the right people with the right jobs. Unlike a traditional web search engine, which seeks to find the most probably “best” answer to each search query based upon a largely pre-determined page rank or social vote, we are actually interested in a more nuanced approach. For example, we don’t want a small subset of jobs getting all job applications simply because they happened have the most votes or the largest number of matching keywords. Factors like timeliness, experience level, location, and matching skillsets are also very important.
Essentially, job search is a supply and demand game, where we serve as a sort of market maker to improve the efficiency of the process. As such, we employ a myriad of different approaches for matching people to jobs: traditional job searching for job seekers to find jobs (keywords, categories, locations, etc.), resume searching for companies to find candidates (only for opted-in job seekers), and recommendation algorithms which can automatically suggest relevant jobs based upon the content of a candidate’s resume, similar user behavior, etc.
DZone: What are some of CareerBuilder's unique challenges in search?
Trey: Some of the most unique challenges for CareerBuilder center around understanding the richness of the data our job seekers have supplied us through their resumes and application choices, as well as the corresponding data employers have provided with regard to their ideal candidates’ qualifications. It’s relatively easy for us to experiment and determine when we’ve improved our macro-level relevancy performance by monitoring changes in job application rates vs a control group. Since every job seeker is unique and every company has specific hiring objectives, however, I find our most interesting challenges to be around understanding these unique characteristics of each candidate and job and adjusting our search and recommendation queries accordingly.
DZone: You led the conversion of CareerBuilder's search platform from FAST ESP to Apache Solr. Why did you think this was necessary and how did you convince upper management to make the change?
Trey: There are a lot of great search technologies out there, and for many organizations it is going to make sense for them to pay a third-party for an out-of-the-box solution. In the case of CareerBuilder, however, I believe that search has to be one of our core competencies. When I took on leadership of our Search Technology Development team a few years ago, I spent a significant portion of my time investigating better options for maximize our agility and investment in this arena. Within 3 months I had a working version of Solr rolled out to some of our niche sites, and within another 3 months I was meeting with my CTO and CEO with a plan to convert our entire platform over to Solr.
The number one selling point wasn’t money (though we’ve saved a lot) or capabilities (we were actually losing a few) – it was our desired speed of innovation. The hardest part wasn’t convincing upper management that converting to Solr was a great idea; it was convincing them that it made sense to increase head-count and short-term spend to do this amidst the greatest economic recession in our lifetimes. Despite reasonable concerns over the risks involved in such an undertaking, my CEO and CTO green-lighted the effort almost immediately once my director and I presented the cost/benefit analysis, and we’ve all been more than pleased with the end results.
DZone: What benefits have resulted from the switch to Solr?
Trey: The number one benefit by far is an increase in our agility. CareerBuilder sees our ability to rapidly respond to market needs as a key competitive advantage. We are now able to do things in hours or days with our Solr implementation that used to take us weeks (or sometimes months if we could do them at all). Some of this speed improvement is related to the underlying technologies in play, but I think most of it is related to the increased focus and expertise in search that has come from us taking our search platform fully into our own hands and being able to customize and dig deeply into the underlying code stack. The community support is also excellent: we’ve definitely had situations with Solr where we’ve sent an e-mail to the community mailing list about a bug and had not only a response, but a fully-functional patch fixing the issue within a few hours. You couldn’t pay for that kind of support.
On the financial front, we’ve saved significant money on licensing and servers by moving to Solr. What this really translates into, however, is increased innovation, as we are actually able to build a much larger team of in-house search experts for the same cost as a third-party system. This translates into us being much more capable of both building new products and better meeting current and arising business needs.
DZone: What were some of your search experiences related to genetic algorithms?
Trey: Because we were converting from a third-party search system with proprietary search algorithms, we were definitely concerned about maintaining the search results quality within our Solr search results. We worked off the assumption that our previous search platform was the gold standard, and we implemented a system which would continually adjust the available relevancy variables (genes) in on our Solr queries and measure how closely the search results approximated those of the old system for the same queries. In repeating this process for many thousands of parallel generations, we were able to hone in on our ideal relevancy parameters quickly and save countless cycles of manually tweaking Solr field weights and settings. It also helped us determine a few pieces of functionality we needed to turn off or dampen which were having too extreme of an impact on the overall search results quality. In the end, this kind of testing allowed us to preserve and eventually improve the search results quality our users were accustomed to experiencing.
DZone: Can you tell us about the cloud-like search API you created for CareerBuilder?
Trey: Absolutely. From the time I started integrating Solr into our platform, my primary goal was to do it in a very generic, reusable, and simple to use way. I essentially created a wrapper around Solr which any developer could pick up and use to write his/her own virtual search engine in just a few hours, with little to no search background.
This required a very simple and clean API, and required extracting the schema out of Solr and giving engineers the ability to modify their schemas at run-time. For example, a developer can define in code a schema object which contains a delimited text field which is case sensitive, delimited by a comma, facetable, and not retrievable. They could define another field which is a free text field containing English, German, and Chinese content (or any combination). In the back-end of the cloud-like framework, we can then map these desired behaviors to a matrix of dynamic fields in Solr which analyze input according to our users’ functional needs, without the engineer having to know anything about how Solr works. In this way, we can essentially deploy every Solr server with the same generic schema and configurations, significantly reducing operation complexity and making deploys and upgrades fairly seamless.
Since my initial version of this cloud-like api, my team done great work scaling this model out to support our entire platform, handling millions of queries an hour and hundreds of millions of documents of various types. The end result is that we can manage our search infrastructure (hardware and software deploys) divorced from our application layer and business logic, as the application layer is completely abstracted away from the underlying architecture from the point of view of anyone outside of our core search team. As such, any engineer at CareerBuilder can write their own search engine application and push up their data and queries into our “search cloud” without requiring my team to get involved, and without spending months learning the ins and outs of Solr.
DZone: Tell us about your side project, Celiaccess.com.
Trey: Sure. Celiaccess is a gluten-free search engine and networking website which I designed as a community for those with Celiac Disease or who otherwise must maintain a gluten-free diet. The site allows visitors to submit and edit products and restaurants, along with pictures, links, gluten-status information, and comments. Celiaccess has been enormously popular as an alternative to time-intensive google searches, and I’ve even released a gluten-free barcode scanner app and a gluten-free gps restaurant locator app for Android users, both registering tens of thousands of downloads. Of course, the search technology powering Celiaccess is all based upon Solr, and the gps locator is heavily dependent upon Solr’s geo-spatial capabilities. The project really gave me a chance to stretch myself in integrating a really diverse set of skills and technologies to help meet the needs of a rapidly growing gluten-free community. Solr was a key component in making this possible.
Check the Lucene Revolution site for conference agenda, pricing, some early bird discounts and training specials may still be available. Don't miss this once a year opportunity!