Relevant Search Using SolrJ in Scala
Learn exactly what a relevant search is and learn how to use the SolrJ HTTP API in Scala to perform relevant, sufficient search queries.
Join the DZone community and get the full member experience.
Join For FreeIn this blog, we will see how we can perform relevant searches in Solr using the SolrJ HTTP API in Scala.
To start with, what is a relevant search?
According to OpenSourceConnections.com, a developer working on search relevancy focuses on the following areas as the “first line of defense:
- Text analysis: The act of “normalizing” text from both a search query and a search result to allow for fuzzy matching. For example, one step, known as stemming, can turn many forms of the same word (“shopped,” shopping,” and “shopper”) all to a more normal form (“shop”) to allow all forms to match.
- Query time weights and boosts: Reweighting the importance of various fields based on search requirements. For example, deciding a title field is more important than other fields.
- Phrase/position matching: Requiring or boosting on the appearance of the entire query or parts of a query as a phrase or based on the position of the words.”
Let's look at an example. Note that the assumption is that you have already created a Solr core.
Problem
Say you have a table/core in Solr that holds userID, HashTag, Text, and LaunchedTime and you have to find out the details of where some particular hashtags are being used.
Solution
We can solve this in about four steps.
1. Dependencies
I am using SolrJ v6.2.1.
2. Solr Connection
You need a Solr connection.
val solrConn: HttpSolrClient = {
val urlString = s“ http: //$solrHostname:$solrPort/solr/$solrKeyspace.$solrTable“
new HttpSolrClient.Builder(urlString).build()
}
Here's what it all means:
$solrHostname
: Hostname/machine IP where your Solr is running.$solrPort
: Port where Solr is running (by default 8983).$solrKeyspace
: Cassandra keyspace name.$solrTable
: Cassandra table name.
Here, you get the HttpSolrClient
. Using this, you can query your Solr engine.
3. Create Solr Query
To create the Solr Query, we use the solrQuery
class:
def createSolrQuery(start: Int, rows: Int): SolrQuery = {
val solrQuery = new SolrQuery
solrQuery.set(“q“, “hashtag: (#modi OR# blackMoney)”)
solrQuery.set(“sort“, “score desc, LaunchedTime desc“)
solrQuery.set(“df“, s“ $HASHTAG“)
solrQuery.set(“start“, s“ $start“)
solrQuery.set(“rows“, s“ $rows“)
solrQuery.set(“fl“, “Text“)
solrQuery
}
Let's understand the function.
solrQuery.set(“q“, “hashtag : (#modi OR #blackMoney)”)
: Searches for the hashtagmodi
OrblackMoney
whereq
determines the basic Solr query.solrQuery.set(“sort“, “score desc, LaunchedTime desc“)
: We are sorting according to the relevancy score (sort from highest score to lowest score) and then according toLaunchedTime
.solrQuery.set(“df“, s“$HASHTAG“)
:df
determines the default search field. We are searching according to the hashtag, so thedf
field determines the searching and calculation of score on the hashtag field. The higher the search parameter in the field, the higher the score and the more relevant the text will be. Note :df
is the default field and will only take effect ifqf
is not defined.solrQuery.set(“start“, s“$start“)
:start
is the searching location/rows to start searching with.solrQuery.set(“rows“, s“$rows“)
:rows
refers to the number of rows to be returned.solrQuery.set(“fl“, “Text“)
:fl
refers to the field text to be returned. Here, we are returned/fetching only the text field.
4. Function to Fetch Result From Solr Using Solr Query
val solrQuery = createSolrQuery(0, 10) // get the solr query using the function created above in Step 3 .
val solrConnection = solrConn(HttpSolrClient, Created in Step 2)
val res: List[SolrDocument] = solrConnection.query(solrQuery).getResults.asScala.toList
val textDetails: List[String] = res.map {
s =>
s.getFieldValues(“Text“).toArray()
}
So, we use the getResult
function to execute the Solr query, then get the result in a list. Using the map
function of Scala, we iterate and get the text field from the result. Now, for the Solr query and its result.
{
"responseHeader": {
"status": 0,
"QTime": 9
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"Text": "#modi,#blackMoney modi rocks!!!!!!!!!"
}
]
}
}
Reference
Published at DZone with permission of Piyush Rana, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments