DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Build a Movie Search Engine with Solr and Ruby

Carlo Scarioni user avatar by
Carlo Scarioni
·
Feb. 24, 12 · Interview
Like (0)
Save
Tweet
Share
7.89K Views

Join the DZone community and get the full member experience.

Join For Free
Solr is a server application built on top of the Apache Lucene searching engine. It offers an HTTP interface for storing and querying data.

Internally the way Solr roughly works (and Lucene as it is the engine that powers solr) is by indexing Documents for later searching and retrieval. A Document is described with a collection of Fields, each of this fields can be individually indexed and/or stored on the index.

The index can be built in different ways. The way the index is built is mainly determined by the analyzers used in each field. So an analyzer simply determines the way a particular field will be indexed.

Of course there is a lot of complexity involved in all this, but this is a basic tutorial, and a basic but functional searching solution can be built using defaults for most options.

This tutorial will allow you to search for movies by title and/or Actor using Ruby and Solr. I will assume you already have Ruby installed and the Gem tool as well.

1. Download and install Solr:
 wget http://apache.mirrors.timporter.net/lucene/solr/3.5.0/apache-solr-3.5.0-src.tgz

2. Decompress it:
 tar zxvf apache-solr-3.5.0-src.tgz

3. Modify the index to accept the kind of documents we want (movies).
 
In our example we will be able to query movies by title and actors. The index will also store a summary of the movie although it won’t be searchable by that. So we will have three Fields in our Document representing the movie. To reflect this go to the directory:

cd apache-solr-3.5.0/solr/example/solr/conf/

Then open the file schema.xml with your favorite editor, go down to the definitions and replace all the ones that are there with the following ones:

    <fields>
      <field name="id" type="string" indexed="true" stored="true" required="true" />
      <field name="title" type="text_general" indexed="true" stored="true"/>
      <field name="actor" type="text_general" indexed="true" stored="true" multiValued="true"/>
     <field name="summary" type="text_general" indexed="false" stored="true"/>  
    </fields>
Here we are specifying that our movie Documents will have these four fields for searching purposes. We can see that the type we are using for all of them is "text_general". Going up in the schema.xml file we can find a description of what being "text_general" means.

So this is a default provided analyzer that wil be good enough for our purposes (and for many purposes).

The other two thing worth mentioning in our field definitions, is the fact that the "actor" field is multivalued, meaning that we can associate more than one actor to the field, and the fact that the "summary" is stored but not indexed. This means that the content of the field will be stored (so it can be retrieved when documents are retrieved) but it is not indexed (we can't search on this field).

Ok, so this is all the configuration we need in Solr. let's start the server now.
 From the directory apache-solr-3.5.0/example. Execute: java -jar start.jar.

That will start the server and will listen in the port 8983 by default.

Ok, so let's move to Ruby side now. We will create a little program that will index a couple of movies, and then search to find them. First require the needed gem:

gem install rsolr
Then let's create a Movie class in a file named "moviesearch.rb":
class Movie
 attr_accessor :id, :title, :actors, :summary
 def initialize
    @actors = []
 end
end
And now let’s create the indexer and searcher classes in the same file:

Indexer:

require 'rsolr'
    class Indexer
     def initialize
        @solr = RSolr.connect :url => 'http://localhost:8983/solr/collection1/'
     end
     def index(movies)
        movies.each do |movie|
         @solr.add :id=>movie.id.to_s, :title=>movie.title, :actor => movie.actors
        end
        @solr.update :data => '<commit/>'
     end
    end
Searcher:
    class Searcher
     def initialize
        @solr = RSolr.connect :url => 'http://localhost:8983/solr/collection1/'
     end
     def search(term)
        term = term.downcase
        response = @solr.get 'select', :params => {:q => "title:#{term}* or actor:#{term}*"}
        list = response["response"]["docs"]
        list
     end
    end
That’s it.

Let’s test it on irb:
1.9.2-p290 :001 > require './moviesearcher'
=> true
1.9.2-p290 :013 >   movie_1 = Movie.new
=> #
1.9.2-p290 :014 > mo
module   movie_1
1.9.2-p290 :014 > movie_1.actors << 'Bruce Willis'
=> ["Bruce Willis"]
1.9.2-p290 :015 > movie_1.actors << "Samuel Jackson"
=> ["Bruce Willis", "Samuel Jackson"]
1.9.2-p290 :016 > movie_1.id = '1'
=> "1"
1.9.2-p290 :017 > movie_1.title='Die Hard 3'
=> "Die Hard 3"
1.9.2-p290 :018 > movie_2 = Movie.new
=> #
1.9.2-p290 :019 > movie_2.actors << 'Mel Gibson'
=> ["Mel Gibson"]
1.9.2-p290 :020 > movie_2.actors << 'Danny Glover'
=> ["Mel Gibson", "Danny Glover"]
1.9.2-p290 :021 > movie_2.id = '2'
=> "2"
1.9.2-p290 :022 > movie_2.title = 'Lethal Weapon'
=> "Lethal Weapon"
1.9.2-p290 :041 >   movie_1.summary = "Great movie"
=> "Great movie"
1.9.2-p290 :042 > movie_2.summary = 'Another great movie'
=> "Another great movie"

Indexing

1.9.2-p290 :061 > idxr=Indexer.new
1.9.2-p290 :080 >   idxr.index [movie_1,movie_2]
=> {"responseHeader"=>{"status"=>0, "QTime"=>50}}


Searching

1.9.2-p290 :085 >   searcher = Searcher.new
1.9.2-p290 :086 > searcher.search 'Die'
=> [{"id"=>"1", "title"=>"Die Hard 3", "actor"=>["Bruce Willis", "Samuel Jackson"]}]
1.9.2-p290 :090 >   searcher.search 'Bru'
=> [{"id"=>"1", "title"=>"Die Hard 3", "actor"=>["Bruce Willis", "Samuel Jackson"]}]

1.9.2-p290 :091 > searcher.search 'Glo'
=> [{"id"=>"2", "title"=>"Lethal Weapon", "actor"=>["Mel Gibson", "Danny Glover"]}]


Search engine (computing) Engine Build (game engine)

Published at DZone with permission of Carlo Scarioni, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Introduction to Spring Cloud Kubernetes
  • Spring Cloud
  • Specification by Example Is Not a Test Framework
  • Use Golang for Data Processing With Amazon Kinesis and AWS Lambda

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: