Over a million developers have joined DZone.
Gold Partner

“Car sale application”– Result Grouping, let’s group some search results (part 6)

· Java Zone

In today’s post we will try to add to our car sale application the new functionality, which allows us to group some search results. Let’s imagine a user who would like to search for “audi a4” advertisements and as a result get the results grouped by car’s year of production, with 2-3 results in every group. And how about some range grouping, for example mileage ranges? Today we will accept the challenge.

New functionality request parameters description

Result grouping functionality is available since solr 3.3. Let’s get to know some of it’s request parameters we will surely need:

  • group – turn on and off result grouping
  • group.field – field name used to group search results. We have to be sure that the field used for grouping (year of production in our case) is single-valued and have the string/text type
  • group.query – query used to group results by ranges, for example mileage ranges
  • group.limit – the number of results to return for each group


This four basic parameters allow us to achieve what we want.

schema.xml changes

Possible schema.xml changes can be made in order to be sure that the group field is of the proper type (“string” or “text”). We would like to group our search results by “year” field, so let’s recall how the definition looks right now:

<field name="year" type="tint" indexed="true" stored="true" required="true" />
 

The field is of integer type. In order to be able to group results using this field, we create another “year” field, let’s call it “year_group”, which will have the string type:

<field name="year_group" type="string" indexed="true" stored="false" />
 

and copy the content of the “year” field to the new field called “year_group”:

<copyField source="year" dest="year_group"/>
 

That’s practically all the changes we should do in our schema.xml configration file.

Some sample data

Let’s now create some sample data in order to test the new functionality. We assume that we have some samples of Audi A4 car data. Two of them are year 2002, another two 2003 and the last one is 2006. Additionally, one of them has the mileage below 100 000 km, three of them have the mileage in the range between 100 000 km and 199 999 km and the last one has the mileage over 200 000 km:

<add>
   <doc>
      <field name="id">1</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2002</field>
      <field name="price">22700</field>
      <field name="engine_size">1900</field>
      <field name="mileage">197000</field>
      <field name="colour">green</field>
      <field name="damaged">false</field>
      <field name="city">Koszalin</field>
      <field name="loc">54.12,16.11</field>
   </doc>
   <doc>
      <field name="id">2</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2003</field>
      <field name="price">27800</field>
      <field name="engine_size">1900</field>
      <field name="mileage">220000</field>
      <field name="colour">black</field>
      <field name="damaged">false</field>
      <field name="city">Bialystok</field>
      <field name="loc">53.08,23.09</field>
   </doc>
   <doc>
      <field name="id">3</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2002</field>
      <field name="price">21300</field>
      <field name="engine_size">1900</field>
      <field name="mileage">125000</field>
      <field name="colour">black</field>
      <field name="damaged">false</field>
      <field name="city">Szczecin</field>
      <field name="loc">53.25,14.35</field>
   </doc>
   <doc>
      <field name="id">4</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2003</field>
      <field name="price">30300</field>
      <field name="engine_size">1900</field>
      <field name="mileage">150000</field>
      <field name="colour">red</field>
      <field name="damaged">false</field>
      <field name="city">Gdansk</field>
      <field name="loc">54.21,18.40</field>
   </doc>
  <doc>
      <field name="id">5</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2006</field>
      <field name="price">32100</field>
      <field name="engine_size">1900</field>
      <field name="mileage">9900</field>
      <field name="colour">red</field>
      <field name="damaged">false</field>
      <field name="city">Swidnik</field>
      <field name="loc">52.15,21.00</field>
   </doc>
</add>

Let’s create queries

Using the parameters described at the beginning of the article, we create the “audi A4” query, which will show us some search results grouped by the year of production:

?q=audi+a4&group=true&group.field=year_group&group.limit=2&fl=id,mileage,make,model,year
 

As we see, we have limited the results in every group to max 2. In response we would like to have only those fields, which will help us clearly and readably identify the documents, so: id, mileage, make, model and year. As a result we have the response:

<lst name="grouped">
  <lst name="year_group">
    <int name="matches">5</int>
    <arr name="groups">
      <lst>
        <str name="groupValue">2002</str>
        <result name="doclist" numFound="2" start="0">
          <doc>
            <str name="id">1</str>
            <str name="make">Audi</str>
            <int name="mileage">197000</int>
            <str name="model">A4</str>
            <int name="year">2002</int>
          </doc>
          <doc>
            <str name="id">3</str>
            <str name="make">Audi</str>
            <int name="mileage">125000</int>
            <str name="model">A4</str>
            <int name="year">2002</int>
          </doc>
        </result>
      </lst>
      <lst>
        <str name="groupValue">2003</str>
        <result name="doclist" numFound="2" start="0">
          <doc>
            <str name="id">2</str>
            <str name="make">Audi</str>
            <int name="mileage">220000</int>
            <str name="model">A4</str>
            <int name="year">2003</int>
          </doc>
          <doc>
            <str name="id">4</str>
            <str name="make">Audi</str>
            <int name="mileage">150000</int>
            <str name="model">A4</str>
            <int name="year">2003</int>
          </doc>
        </result>
      </lst>
      <lst>
        <str name="groupValue">2006</str>
        <result name="doclist" numFound="1" start="0">
          <doc>
            <str name="id">5</str>
            <str name="make">Audi</str>
            <int name="mileage">9900</int>
            <str name="model">A4</str>
            <int name="year">2006</int>
          </doc>
        </result>
      </lst>
    </arr>
  </lst>
</lst>

Let’s analyse the response. We have 5 matches:

<int name="matches">5</int>

The response has been split into 3 independent groups:

  1. <str name="groupValue">2002</str>
     

    where we have two (numFound=”2″) 2002 cars

  2. <str name="groupValue">2003</str>
     

    where we have two (numFound=”2″) 2003 cars

  3. <str name="groupValue">2006</str>
     

    where we have one (numFound=”1″) 2006 car

That’s correct!

Now let’s create query, which will group our search results by the mileage ranges. We assume that we have 3 ranges:

  1. <0km ; 99999km>
  2. <100000km ; 199999km>
  3. <200000km ; * >

Query:

?q=audi+a4&group=true&group.query=mileage:[0+TO+99999]&group.query=mileage:[100000+TO+199999]&group.query=mileage:[200000+TO+*]&group.limit=3&fl=id,mileage,make,model,year
 

and response:

<lst name="grouped">
  <lst name="mileage:[0 TO 99999]">
    <int name="matches">5</int>
    <result name="doclist" numFound="1" start="0">
      <doc>
        <str name="id">5</str>
        <str name="make">Audi</str>
        <int name="mileage">9900</int>
        <str name="model">A4</str>
        <int name="year">2006</int>
      </doc>
    </result>
  </lst>
  <lst name="mileage:[100000 TO 199999]">
    <int name="matches">5</int>
    <result name="doclist" numFound="3" start="0">
      <doc>
        <str name="id">1</str>
        <str name="make">Audi</str>
        <int name="mileage">197000</int>
        <str name="model">A4</str>
        <int name="year">2002</int>
      </doc>
      <doc>
        <str name="id">3</str>
        <str name="make">Audi</str>
        <int name="mileage">125000</int>
        <str name="model">A4</str>
        <int name="year">2002</int>
      </doc>
      <doc>
        <str name="id">4</str>
        <str name="make">Audi</str>
        <int name="mileage">150000</int>
        <str name="model">A4</str>
        <int name="year">2003</int>
      </doc>
    </result>
  </lst>
  <lst name="mileage:[200000 TO *]">
    <int name="matches">5</int>
    <result name="doclist" numFound="1" start="0">
      <doc>
        <str name="id">2</str>
        <str name="make">Audi</str>
        <int name="mileage">220000</int>
        <str name="model">A4</str>
        <int name="year">2003</int>
      </doc>
    </result>
  </lst>
</lst>

Again we have 5 search results. In the first group there is a car with the mileage of 9900 km, in the second group there are cars with the mileage of 197000 km, 125000 km and 150000 km, and finally in the third group there is a car with the mileage of 220000km. We achieve what we wanted. Mission accomplished.

The end

Yet another functionality, this time search results grouping one, is now added to our car sale application. We will surely see what will be the users opinions :)

Topics:

Published at DZone with permission of Rafał Andrzejewski , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}