Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Relevant Search Using SolrJ in Scala

DZone's Guide to

Relevant Search Using SolrJ in Scala

Learn exactly what a relevant search is and learn how to use the SolrJ HTTP API in Scala to perform relevant, sufficient search queries.

· Big Data Zone
Free Resource

See how the beta release of Kubernetes on DC/OS 1.10 delivers the most robust platform for building & operating data-intensive, containerized apps. Register now for tech preview.

In this blog, we will see how we can perform relevant searches in Solr using the SolrJ HTTP API in Scala.

To start with, what is a relevant search?

According to OpenSourceConnections.com, a developer working on search relevancy focuses on the following areas as the “first line of defense:

  • Text analysis: The act of “normalizing” text from both a search query and a search result to allow for fuzzy matching. For example, one step, known as stemming, can turn many forms of the same word (“shopped,” shopping,” and “shopper”) all to a more normal form (“shop”) to allow all forms to match.
  • Query time weights and boosts: Reweighting the importance of various fields based on search requirements. For example, deciding a title field is more important than other fields.
  • Phrase/position matching: Requiring or boosting on the appearance of the entire query or parts of a query as a phrase or based on the position of the words.”

Let's look at an example. Note that the assumption is that you have already created a Solr core.

Problem

Say you have a table/core in Solr that holds userID, HashTag, Text, and LaunchedTime and you have to find out the details of where some particular hashtags are being used.

Solution 

We can solve this in about four steps.

1. Dependencies

I am using SolrJ v6.2.1.

2. Solr Connection

You need a Solr connection. 

val solrConn: HttpSolrClient = {
 val urlString = s“ http: //$solrHostname:$solrPort/solr/$solrKeyspace.$solrTable“
  new HttpSolrClient.Builder(urlString).build()
}

Here's what it all means:

  • $solrHostname: Hostname/machine IP where your Solr is running.

  • $solrPort: Port where Solr is running (by default 8983).

  • $solrKeyspace: Cassandra keyspace name.

  • $solrTable: Cassandra table name.

Here, you get the HttpSolrClient. Using this, you can query your Solr engine.

3. Create Solr Query

To create the Solr Query, we use the solrQuery class:

def createSolrQuery(start: Int, rows: Int): SolrQuery = {
 val solrQuery = new SolrQuery
 solrQuery.set(“q“, “hashtag: (#modi OR# blackMoney)”)
 solrQuery.set(“sort“, “score desc, LaunchedTime desc“)
 solrQuery.set(“df“, s“ $HASHTAG“)
 solrQuery.set(“start“, s“ $start“)
 solrQuery.set(“rows“, s“ $rows“)
 solrQuery.set(“fl“, “Text“)
 solrQuery
}

Let's understand the function.

  • solrQuery.set(“q“, “hashtag : (#modi OR #blackMoney)”): Searches for the hashtag modi Or blackMoney where q determines the basic Solr query.

  • solrQuery.set(“sort“, “score desc, LaunchedTime desc“): We are sorting according to the relevancy score (sort from highest score to lowest score) and then according to LaunchedTime.

  • solrQuery.set(“df“, s“$HASHTAG“): df determines the default search field. We are searching according to the hashtag, so the df field determines the searching and calculation of score on the hashtag field. The higher the search parameter in the field, the higher the score and the more relevant the text will be. Note :df is the default field and will only take effect if qf is not defined.

  • solrQuery.set(“start“, s“$start“): start is the searching location/rows to start searching with.

  • solrQuery.set(“rows“, s“$rows“): rows refers to the number of rows to be returned.

  • solrQuery.set(“fl“, “Text“): fl refers to the field text to be returned. Here, we are returned/fetching only the text field.

4. Function to Fetch Result From Solr Using Solr Query

val solrQuery = createSolrQuery(0, 10) // get the solr query using the function created above in Step 3 .
val solrConnection = solrConn(HttpSolrClient, Created in Step 2)
val res: List[SolrDocument] = solrConnection.query(solrQuery).getResults.asScala.toList
val textDetails: List[String] = res.map {
 s =>
  s.getFieldValues(“Text“).toArray()
}

So, we use the getResult function to execute the Solr query, then get the result in a list. Using the map function of Scala, we iterate and get the text field from the result. Now, for the Solr query and its result.

{
  "responseHeader": {
    "status": 0,
    "QTime": 9
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "Text": "#modi,#blackMoney modi rocks!!!!!!!!!"
      }
    ]
  }
}

Reference

New Mesosphere DC/OS 1.10: Production-proven reliability, security & scalability for fast-data, modern apps. Register now for a live demo.

Topics:
big data ,solr ,scala ,searching ,tutoiral

Published at DZone with permission of Piyush Rana, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}