Using Sphinx and Java to Implement Free Text Search
Join the DZone community and get the full member experience.
Join For FreeAs promised I am going to provide an article on how we can use Sphinx with Java to perform a full text search. I will begin the article with an introduction to Sphinx.
Introduction to Sphinx
Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem.
Here are few Sphinx Unique Features:
The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia)
Sphinx Terms & How It Works
The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB.
searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service.
Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index.
Installation on Windows
So now we come to the part of installing Sphinx on Windows:
Sample Application
Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application.
We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees.
The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional.
Let's create our sample database 'addressbook'
We next create a procedure that we will use from java to fetch records that we just inserted.
Configuring Sphinx
It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here.
By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application
addressbook.conf (read between the lines)
Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below.
Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732
Note: You see a lot of information for the result, this is because of following line in our configuration file
sql_query_info = SELECT * FROM AddressBook WHERE id=$id
If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them.
Sphinx as Windows Service
Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine.
Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is :
Note: If you try to install Sphinx without admin rights, you may get following error messages.
ok so far so good, let's run some tests now ............
How we update the Index once database changes?
For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes.
Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below.
Capsule
We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data.
Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :)
- Munish Gogna
Introduction to Sphinx
Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem.
Here are few Sphinx Unique Features:
- high indexing speed (up to 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (up to 100 GB of text, upto 100 M documents on a single CPU)
- provides distributed searching capabilities
- provides searching from within MySQL through pluggable storage engine
- supports boolean, phrase, and word proximity queries
- supports multiple full-text fields per document (upto 32 by default)
- supports multiple additional attributes per document (ie. groups, timestamps, etc)
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia)
Sphinx Terms & How It Works
The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB.
- Document is set of text fields and number attributes + unique ID – similar to row in DB
- Set of fields and attributes is constant for index – similar to table in DB
- Fields are searchable for FullText queries
- Attributes may be used for filtering, sorting, grouping
searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service.
Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index.
Installation on Windows
So now we come to the part of installing Sphinx on Windows:
- Download Sphinx from the official Sphinx download site i.e http://sphinxsearch.com (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.9-win32.zip)
- Unzip the file to some folder, I unzipped to C:\devel\sphinx-0.9.9-win32 and added the bin directory to the windows path variable
Sample Application
Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application.
We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees.
The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional.
The screen is self explanatory, let's move ahead and define our database. As Sphinx works well with MySQL and MySQL being free also, lets create our db scripts around mysql database (Those who wish to install MySQL can dowload it from http://www.mysql.com)
Let's create our sample database 'addressbook'
mysql> create database addressbook;Note: The fields defined in the following tables are for the purpose of learning only and may not contain a complete set of fields that microsoft address book or any similar software may provide.
Query OK, 1 row affected (0.03 sec)
mysql> use addressbook;
Database changed
mysql> CREATE TABLE addressbook (It's time to put some dummy data into the table, so let's fill our tables. Our virtual company 'gogs.it' has six offices across India and Singapore as defined in the following insert script.
Id int(11) NOT NULL,
FirstName varchar(30) NOT NULL,
LastName varchar(30) NOT NULL,
OfficeId int(11) DEFAULT NULL,
Title varchar(20) DEFAULT NULL,
Alias varchar(20) NOT NULL,
Email varchar(50) NOT NULL,
DOJ date NOT NULL,
PhoneNo varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
mysql> CREATE TABLE CompanyLocations (
Id int(11) NOT NULL,
Location varchar(60) NOT NULL,
Country varchar(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
mysql> insert into CompanyLocations (Id, Location, Country) VALUES (1, 'Tower One, Harbour Front, Singapore', 'SG');Now comes the real stuff... The data sphinx is going to index, let's populate that as well...wooooo
insert into CompanyLocations (Id, Location, Country) VALUES (2, 'DLF Phase 3, Gurgaon, India', 'IN');
insert into CompanyLocations (Id, Location, Country) VALUES (3, 'Hiranandani Gardens, Powai, Mumbai, India', 'IN');
insert into CompanyLocations (Id, Location, Country) VALUES (4, 'Hinjwadi, Pune, India', 'IN');
insert into CompanyLocations (Id, Location, Country) VALUES (5, 'Toll Post, Nagrota, Jammu, India', 'IN');
insert into CompanyLocations (Id, Location, Country) VALUES (6, 'Bani (Kathua), India', 'IN');
mysql> INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUESPlease note that above employee data is just a data *only data* I created using a small java programme using random number generators and reading some names file, so you may find titles getting messed up :(
(1,'Aabheer','Kumar',1,'Mr','u534','Kumar.Aabheer@gogs.it','2008-9-3', '+911234599990');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(2,'Aadarsh','Gupta',6,'Mr','u668','Gupta.Aadarsh@gogs.it','2007-2-23','+911234599991');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(3,'Aachman','Singh',5,'Mr','u2766','Singh.Aachman@gogs.it','2006-12-18','+911234599992');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(4,'Aadesh','Shrivastav',5,'Mr','u3198','Shrivastav.Aadesh@gogs.it','2007-11-23','+911234599993');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(5,'Aadi','manav',1,'Mr','u2686','manav.Aadi@gogs.it','2010-7-20','+911234599994');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(6,'Aadidev','singh',4,'Mr','u572','singh.Aadidev@gogs.it','2010-8-18','+911234599995');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(7,'Aafreen','sheikh',4,'Smt','u1092','sheikh.Aafreen@gogs.it','2007-7-11','+911234599996');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(8,'Aakar','Sherpa',5,'Mr','u1420','Sherpa.Aakar@gogs.it','2009-10-3','+911234599997');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(9,'Aakash','Singh',4,'Mrs','u2884','Singh.Aakash@gogs.it','2008-6-11','+911234599998');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(10,'Aalap','Singhania',4,'Mrs','u609','Singhania.Aalap@gogs.it','2010-10-8','+911234599999');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(11,'Aandaleeb','mahajan',1,'Smt','u131','mahajan.Aandaleeb@gogs.it','2010-10-21','+911234580001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(12,'Mamata','kumari',5,'Sh','u2519','kumari.Mamata@gogs.it','2009-6-12','+911234580002');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(13,'Mamta','sharma',6,'Smt','u4123','sharma.Mamta@gogs.it','2009-2-8','+911234580003');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(14,'Manali','singh',6,'Mr','u1078','singh.Manali@gogs.it','2008-6-14','+911234580004');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(15,'Manda','saxena',1,'Mrs','u196','saxena.Manda@gogs.it','2010-9-4','+911234580005');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(16,'Salila','shetty',3,'Miss','u157','shetty.Salila@gogs.it','2009-11-15','+911234580006');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(17,'Salima','happy',3,'Mrs','u3445','happy.Salima@gogs.it','2006-7-14','+911234580007');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(18,'Salma','haik',5,'Sh','u4621','haik.Salma@gogs.it','2008-6-23','+911234580008');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(19,'Samita','patil',3,'Smt','u3156','patil.Samita@gogs.it','2006-6-7','+911234580009');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(20,'Sameena','sheikh',5,'Mrs','u952','sheikh.Sameena@gogs.it','2008-8-13','+911234580010');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(21,'Ranita','gupta',5,'Mrs','u2664','gupta.Ranita@gogs.it','2008-10-20','+911234580011');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(22,'Ranjana','sharma',1,'Sh','u3085','sharma.Ranjana@gogs.it','2010-6-21','+911234580012');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(23,'Ranjini','singh',6,'Mrs','u4200','singh.Ranjini@gogs.it','2007-4-13','+911234580013');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(24,'Ranjita','vyapari',2,'Smt','u1109','vyapari.Ranjita@gogs.it','2008-1-22','+911234580014');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(25,'Rashi','gupta',6,'Mrs','u3492','gupta.Rashi@gogs.it','2006-2-2','+911234580015');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(26,'Rashmi','sehgal',3,'Mr','u3248','sehgal.Rashmi@gogs.it','2008-9-9','+911234580016');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(27,'Rashmika','sexy',1,'Mrs','u4599','sexy.Rashmika@gogs.it','2009-3-12','+911234580017');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(28,'Rasika','dulari',3,'Smt','u2089','dulari.Rasika@gogs.it','2009-1-24','+911234580018');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(29,'Dilber','lover',6,'Mr','u4241','lover.Dilber@gogs.it','2007-10-11','+911234580019');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(30,'Dilshad','happy',1,'Mr','u1564','happy.Dilshad@gogs.it','2007-4-8','+911234580020');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(31,'Dipali','lights',5,'Sh','u1127','lights.Dipali@gogs.it','2006-11-1','+911234580021');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(32,'Dipika','lamp',1,'Sh','u2271','lamp.Dipika@gogs.it','2010-12-17','+911234580022');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(33,'Dipti','brightness',5,'Smt','u422','brightness.Dipti@gogs.it','2010-9-25','+911234580023');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(34,'Disha','singh',3,'Sh','u4604','singh.Disha@gogs.it','2006-5-2','+911234580024');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(35,'Maadhav','Krishna',1,'Miss','u2561','Krishna.Maadhav@gogs.it','2007-11-6','+911234580025');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(36,'Maagh','month',5,'Miss','u874','month.Maagh@gogs.it','2008-5-8','+911234580026');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(37,'Maahir','Skilled',4,'Mr','u3372','Skilled.Maahir@gogs.it','2007-8-4','+911234580027');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(38,'Maalolan','Ahobilam',5,'Mrs','u3498','Ahobilam.Maalolan@gogs.it','2007-7-9','+911234580028');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(39,'Maandhata','King',1,'Smt','u2089','King.Maandhata@gogs.it','2009-9-3','+911234580029');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(40,'Maaran','Brave',2,'Miss','u4020','Brave.Maaran@gogs.it','2008-4-5','+9112345606001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(41,'Maari','Rain',2,'Sh','u3593','Rain.Maari@gogs.it','2007-12-5','+9112345606002');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(42,'Madan','Cupid',4,'Mrs','u795','Cupid.Madan@gogs.it','2007-11-11','+9112345606003');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(43,'Madangopal','Krishna',3,'Sh','u438','Krishna.Madangopal@gogs.it','2007-2-19','+9112345606004');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(44,'sahil','gogna',1,'Sh','u2273','gogna.sahil@gogs.it','2007-10-7','+9112345606005');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(45,'nikhil','gogna',2,'Mr','u1240','gogna.nikhil@gogs.it','2009-9-14','+9112345606006');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(46,'amit','gogna',5,'Sh','u3879','gogna.amit@gogs.it','2006-2-8','+9112345606007');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(47,'krishan','gogna',4,'Miss','u3632','gogna.krishan@gogs.it','2010-9-20','+9112345606008');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(48,'anil','kashyap',4,'Smt','u3939','kashyap.anil@gogs.it','2010-3-15','+9112345606009');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(49,'sunil','kashyap',5,'Mrs','u3493','kashyap.sunil@gogs.it','2008-3-16','+9112345606010');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(50,'sandy','singh',6,'Mrs','u4691','singh.sandy@gogs.it','2009-6-2','+9112345606011');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(51,'vishal','kapoor',3,'Mr','u1087','kapoor.vishal@gogs.it','2010-5-13','+9112345606012');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(52,'bala','ji',5,'Mrs','u4762','ji.bala@gogs.it','2007-8-9','+9112345606013');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(53,'karan','sarin',4,'Miss','u3030','sarin.karan@gogs.it','2008-4-8','+9112345606014');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(54,'abhishek','kumar',4,'Miss','u1093','kumar.abhishek@gogs.it','2008-12-21','+9112345605001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(55,'babu','the',1,'Miss','u1055','the.babu@gogs.it','2008-7-2','+9112345506001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(56,'sandeep','gainda',3,'Miss','u1320','gainda.sandeep@gogs.it','2010-5-14','+9112345606301');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(57,'dheeraj','kumar',3,'Miss','u3685','kumar.dheeraj@gogs.it','2007-10-14','+9112345606091');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(58,'dharmendra','chauhan',1,'Smt','u3235','chauhan.dharmendra@gogs.it','2008-8-1','+9112345806001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(59,'max','alan',3,'Smt','u3465','alan.max@gogs.it','2009-5-5','+9112345608011');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(60,'hidayat','khan',3,'Smt','u958','khan.hidayat@gogs.it','2007-11-18','+911234599101');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(61,'himnashu','singh',4,'Miss','u2027','singh.himnashu@gogs.it','2008-3-2','+911234599102');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(62,'dinesh','kumar',6,'Sh','u3233','kumar.dinesh@gogs.it','2008-5-9','+911234599103');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(63,'toshi','prakash',1,'Mr','u3766','prakash.toshi@gogs.it','2010-9-17','+911234599104');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(64,'niti','puri',3,'Mr','u3575','puri.niti@gogs.it','2009-11-15','+911234599105');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(65,'pawan','tikki',3,'Sh','u3919','tikki.pawan@gogs.it','2006-3-19','+911234599106');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(66,'gaurav','sharma',2,'Sh','u413','sharma.gaurav@gogs.it','2010-4-2','+911234599107');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(67,'himanshu','verma',2,'Mrs','u4732','verma.himanshu@gogs.it','2009-3-20','+911234599108');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(68,'priyanshu','verma',3,'Sh','u183','verma.priyanshu@gogs.it','2010-8-12','+911234599109');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(69,'nitika','luthra',2,'Mrs','u4259','luthra.nitika@gogs.it','2010-7-12','+911234599110');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(70,'neeru','gogna',2,'Sh','u1633','gogna.neeru@gogs.it','2010-6-23','+91532110000');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(71,'bindu','gupta',1,'Sh','u1859','gupta.bindu@gogs.it','2006-11-10','+91532110001');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(72,'gurleen','bakshi',5,'Miss','u1423','bakshi.gurleen@gogs.it','2007-7-1','+91532110003');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(73,'rahul','gupta',3,'Sh','u1223','gupta.rahul@gogs.it','2009-8-11','+91532110004');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(74,'jagdish','salgotra',3,'Mr','u12','salgotra.jagdish@gogs.it','2008-5-19','+91532110005');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(75,'vikas','sharma',3,'Smt','u465','sharma.vikas@gogs.it','2006-6-2','+91532110006');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(76,'poonam','mahendra',2,'Sh','u1744','mahendra.poonam@gogs.it','2009-12-2','+91532110007');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(77,'pooja','kulkarni',3,'Mrs','u1903','kulkarni.pooja@gogs.it','2008-10-6','+91532110008');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(78,'priya','mahajan',6,'Sh','u4205','mahajan.priya@gogs.it','2010-8-5','+91532110009');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(79,'manoj','zerger',1,'Mrs','u3369','zerger.manoj@gogs.it','2009-12-4','+91532110010');
INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES
(80,'mohan','master',5,'Mr','u2841','master.mohan@gogs.it','2010-10-7','+91532110011');
We next create a procedure that we will use from java to fetch records that we just inserted.
DROP PROCEDURE IF EXISTS search_address_book;
CREATE PROCEDURE search_address_book(IN address_ids VARCHAR(1000) )
BEGIN
DECLARE search_address_query VARCHAR(2000) DEFAULT '';
SET address_ids = CONCAT('\'', REPLACE(address_ids, ',', '\',\''), '\'');
SET search_address_query = CONCAT(search_address_query, ' select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, ab.Title as Title, ab.Alias as Alias, ab.Email as Email, ab.DOJ as DOJ, ab.PhoneNo as PhoneNo ' );
SET search_address_query = CONCAT(search_address_query, ' from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id ');
SET search_address_query = CONCAT(search_address_query, ' where ab.id IN (', address_ids ,') ');
SET @statement = search_address_query;
PREPARE dynquery FROM @statement;
EXECUTE dynquery;
DEALLOCATE PREPARE dynquery;
END;
# To get records for ids 1, 6 and 7, we run following commands:
call search_address_book('1,6,7');
Configuring Sphinx
It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here.
By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application
addressbook.conf (read between the lines)
#############################################################################
## data source definition
#############################################################################
source addressBookSource
{
## SQL settings for 'mysql' ##
type = mysql
# some straightforward parameters for SQL source types
sql_host = localhost
sql_user = root
sql_pass = root
sql_db = addressbook
sql_port = 3306 # optional, default is 3306
# pre-query, executed before the main fetch query
sql_query_pre = SET NAMES utf8
# main document fetch query, integer document ID field MUST be the first selected column
sql_query = \
select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, \
ab.Title as Title, ab.Alias as Alias, ab.Email as Email, UNIX_TIMESTAMP(ab.DOJ) as DOJ, ab.PhoneNo as PhoneNo \
from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id
sql_attr_timestamp = DOJ
# document info query, ONLY for CLI search (ie. testing and debugging) , optional, default is empty must contain $id macro and must fetch the document by that id
sql_query_info = SELECT * FROM AddressBook WHERE id=$id
}
#############################################################################
## index definition
#############################################################################
# local index example, this is an index which is stored locally in the filesystem
index addressBookIndex
{
# document source(s) to index
source = addressBookSource
# index files path and file name, without extension, make sure you have this folder
path = C:\devel\sphinx-0.9.9-win32\data\addressBookIndex
# document attribute values (docinfo) storage mode
docinfo = extern
# memory locking for cached data (.spa and .spi), to prevent swapping
mlock = 0
morphology = none
# make sure this file exists
exceptions =C:\devel\sphinx-0.9.9-win32\data\exceptions.txt
enable_star = 1
}
#############################################################################
## indexer settings
#############################################################################
indexer
{
# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
# optional, default is 32M, max is 2047M, recommended is 256M to 1024M
mem_limit = 32M
# maximum IO calls per second (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iops = 40
# maximum IO call size, bytes (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iosize = 1048576
# maximum xmlpipe2 field length, bytes
# optional, default is 2M
#
# max_xmlpipe2_field = 4M
# write buffer size, bytes
# several (currently up to 4) buffers will be allocated
# write buffers are allocated in addition to mem_limit
# optional, default is 1M
#
# write_buffer = 1M
}
#############################################################################
## searchd settings
#############################################################################
searchd
{
# hostname, port, or hostname:port, or /unix/socket/path to listen on
listen = 9312
# log file, searchd run info is logged here
# optional, default is 'searchd.log'
log = C:\devel\sphinx-0.9.9-win32\data\log\searchd.log
# query log file, all search queries are logged here
# optional, default is empty (do not log queries)
query_log = C:\devel\sphinx-0.9.9-win32\data\log\query.log
# client read timeout, seconds
# optional, default is 5
read_timeout = 5
# request timeout, seconds
# optional, default is 5 minutes
client_timeout = 300
# maximum amount of children to fork (concurrent searches to run)
# optional, default is 0 (unlimited)
max_children = 30
# PID file, searchd process ID file name
# mandatory
pid_file = C:\devel\sphinx-0.9.9-win32\data\log\searchd.pid
# max amount of matches the daemon ever keeps in RAM, per-index
# WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL
# default is 1000 (just like Google)
max_matches = 1000
# seamless rotate, prevents rotate stalls if precaching huge datasets
# optional, default is 1
seamless_rotate = 1
# whether to forcibly preopen all indexes on startup
# optional, default is 0 (do not preopen)
preopen_indexes = 0
}
# --eof--
Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below.
C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.confNote: As I told earlier that Sphinx creates 1 document for each row, as we had 80 rows in the database so a total of 80 docs are created. Time taken is also very very small, believe me I tried with half million rows and it took around 3-4 seconds :) cool isn't it?
CONSOLE:
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'...
indexing index 'addressBookIndex'...
collected 80 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 80 docs, 5514 bytes
total 0.057 sec, 96386 bytes/sec, 1398.43 docs/sec
total 2 reads, 0.000 sec, 3.5 kb/call avg, 0.0 msec/call avg
total 7 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg
Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732
C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf u4732As you can see above this is a unique record for Himanshu.
CONSOLE:
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'...
index 'addressBookIndex': query 'u4732 ': returned 1 matches of 1 total in 0.001 sec
displaying matches:
1. document=67, weight=1, doj=Fri Mar 20 00:00:00 2009
Id=67
FirstName=himanshu
LastName=verma
OfficeId=2
Title=Mrs
Alias=u4732
Email=verma.himanshu@gogs.it
DOJ=2009-03-20
PhoneNo=+911234599108
words:
1. 'u4732': 1 documents, 1 hits
words:
1. 'u4732': 1 documents, 1 hits
Note: You see a lot of information for the result, this is because of following line in our configuration file
sql_query_info = SELECT * FROM AddressBook WHERE id=$id
If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them.
C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --any toshi gurleenExactly two records were returned and this is what we were expecting.
CONSOLE:
displaying matches:
1. document=63, weight=2, doj=Fri Sep 17 00:00:00 2010
Id=63
FirstName=toshi
LastName=prakash
OfficeId=1
Title=Mr
Alias=u3766
Email=prakash.toshi@gogs.it
DOJ=2010-09-17
PhoneNo=+911234599104
2. document=72, weight=2, doj=Sun Jul 01 00:00:00 2007
Id=72
FirstName=gurleen
LastName=bakshi
OfficeId=5
Title=Miss
Alias=u1423
Email=bakshi.gurleen@gogs.it
DOJ=2007-07-01
PhoneNo=+91532110003
The following special operators and modifiers can be used when using the extended matching mode:
operator OR:
nikhil | sahil
operator NOT:
hello -sandy
hello !sandy
field search operator:
@Email gogna.sahil@gogna.it
For a complete set of search features , I advise you to go through http://sphinxsearch.com/docs/manual-0.9.9.html#searching link.
Sphinx as Windows Service
Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine.
Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is :
C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearchWell now the sphinx is ready to serve us on port 9312
CONSOLE:
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
Installing service...
Service 'SphinxSearch' installed succesfully.
Note: If you try to install Sphinx without admin rights, you may get following error messages.
C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearchOnce done you can start the service as:
CONSOLE:
Installing service...
FATAL: OpenSCManager() failed: code=5, error=Access is denied.
c:\>sc start SphinxSearch (or alternatively from the services screen, start 'services.msc' in windows Run)Let's create an adapter to fetch data from the database.
If some how you want to delete the service , use c:\>sc delete SphinxSearch
package it.gogs.sphinx.util;Next we create the SphinxInstance that will parse the keywords and date range and provide us a list of Ids that matches the search.
import it.gogs.sphinx.AddressBoook;
import it.gogs.sphinx.exception.AddressBookBizException;
import it.gogs.sphinx.exception.AddressBookTechnicalException;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import org.apache.log4j.Logger;
/**
* Adapter to fetch data from the database.
*
* @author Munish Gogna
*
*/
public class AddressBookAdapter {
private static Logger logger = Logger.getLogger(AddressBookAdapter.class);
private AddressBookAdapter() {
// use in static way..
}
private static Connection getConnection()
throws AddressBookTechnicalException {
String userName = "root";
String password = "root";
String url = "jdbc:mysql://localhost/addressbook";
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
return DriverManager.getConnection(url, userName, password);
} catch (Exception e) {
throw new AddressBookTechnicalException("could not get connection");
}
}
public static List getAddressBookList(List addressIds)
throws AddressBookTechnicalException, AddressBookBizException {
List addressBoookList = new ArrayList();
if (addressIds == null || addressIds.size() == 0){
logger.error("AddressIds was null or empty, returning empty list");
return addressBoookList;
}
Connection connection = null;
CallableStatement callableStatement = null;
try {
connection = getConnection();
callableStatement = connection.prepareCall("{ call search_address_book(?)}");
callableStatement.setString(1, Utils.toCommaString(addressIds));
callableStatement.execute();
ResultSet resultSet = callableStatement.getResultSet();
prepareResults(resultSet, addressBoookList);
connection.close();
} catch (SQLException e) {
logger.error("Problem connecting MYSQL - " + e.getMessage());
throw new AddressBookTechnicalException(e.getMessage());
} catch (AddressBookTechnicalException e) {
logger.error("Problem connecting MYSQL - " + e.getMessage());
throw e;
} finally{
if(connection != null){
try {
connection.close();
} catch (SQLException e) {
logger.error("Problem closing conection - " + e.getMessage());
e.printStackTrace();
}
}
}
return addressBoookList;
}
private static void prepareResults(ResultSet resultSet,
List addressBoookList) throws SQLException {
AddressBoook addressBoook;
while (resultSet.next()) {
addressBoook = new AddressBoook();
addressBoook.setAlias(resultSet.getString("Alias"));
addressBoook.setEmail(resultSet.getString("Email"));
addressBoook.setfName(resultSet.getString("FName"));
addressBoook.setlName(resultSet.getString("LName"));
addressBoook.setOfficeLocation(resultSet.getString("Location"));
addressBoook.setPhoneNo(resultSet.getString("PhoneNo"));
addressBoook.setTitle(resultSet.getString("Title"));
addressBoook.setDateOfJoining(resultSet.getDate("DOJ"));
addressBoook.setId(resultSet.getLong("Id"));
addressBoookList.add(addressBoook);
}
}
}
package it.gogs.sphinx.util;Here is the interface that I will expose to the outside world (in my future article I will expose this interface as Web Service)
import it.gogs.sphinx.DateRange;
import it.gogs.sphinx.SearchCriteria;
import it.gogs.sphinx.api.SphinxClient;
import it.gogs.sphinx.api.SphinxException;
import it.gogs.sphinx.api.SphinxMatch;
import it.gogs.sphinx.api.SphinxResult;
import it.gogs.sphinx.exception.AddressBookBizException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import org.apache.log4j.Logger;
/**
* Instance that will parse our free text and provide the results.
*
* Note: Make sure that 'searchd' is up and running before you use this class
* @author Munish Gogna
*
*/
public class SphinxInstance {
private static String SPHINX_HOST = "localhost";
private static String SPHINX_INDEX = "addressBookIndex";
private static int SPHINX_PORT = 9312;
private static SphinxClient sphinxClient;
private static Logger logger = Logger.getLogger(SphinxInstance.class);
static {
sphinxClient = new SphinxClient(SPHINX_HOST, SPHINX_PORT);
}
public static List getAddressBookIds(SearchCriteria criteria)
throws AddressBookBizException, SphinxException {
List addressIdsList = new ArrayList();
try {
if (Utils.isNull(criteria)) {
logger.error("criteria is null");
throw new AddressBookBizException("criteria is null");
}
if (Utils.isNull(criteria.getKeywords())) {
logger.error("keyword is a required field");
throw new AddressBookBizException("keyword is a required field");
}
DateRange dateRange = criteria.getDateRage();
if (!Utils.isNull(dateRange)) {
if (Utils.isDateRangeValid(dateRange)) {
// this is to filter results based on joining dates if they are provided
sphinxClient.SetFilterRange("DOJ", getTimeInSeconds(dateRange.getFromDate()),
getTimeInSeconds(dateRange.getToDate()), false);
} else {
logger.error(" fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'");
throw new AddressBookBizException("fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'");
}
}
sphinxClient.SetMatchMode(SphinxClient.SPH_MATCH_EXTENDED2);
sphinxClient.SetSortMode(SphinxClient.SPH_SORT_RELEVANCE, "");
SphinxResult result = sphinxClient.Query(buildSearchQuery(criteria), SPHINX_INDEX, "buidling query for address book search");
SphinxMatch[] matches = result.matches;
for (SphinxMatch match : matches) {
addressIdsList.add(String.valueOf(match.docId));
}
} catch (SphinxException e) {
throw e;
} catch (AddressBookBizException e) {
throw e;
}
logger.info("Total record(s):" + addressIdsList.size());
return addressIdsList;
}
private static long getTimeInSeconds(Date time) {
return time.getTime()/1000;
}
private static String buildSearchQuery(SearchCriteria criteria)
throws AddressBookBizException {
String keywords[] = criteria.getKeywords().split(" ");
StringBuilder searchFor = new StringBuilder();
for (String key : keywords) {
if (!Utils.isEmpty(key)) {
searchFor.append(key);
if (searchFor.length() > 1) {
searchFor.append("*|*");
}
}
}
searchFor.delete(searchFor.lastIndexOf("|*"), searchFor.length());
StringBuilder queryBuilder = new StringBuilder();
String query = searchFor.toString();
queryBuilder.append("@FName *" + query + " | ");
queryBuilder.append("@LName *" + query + " | ");
queryBuilder.append("@Title *" + query + " | ");
queryBuilder.append("@Location *"+ query + " | ");
queryBuilder.append("@Alias *" + query + " | ");
queryBuilder.append("@Email *" + query + " | ");
queryBuilder.append("@PhoneNo *" + query);
logger.info("Sphinx Query: " + queryBuilder.toString());
return queryBuilder.toString();
}
}
import it.gogs.sphinx.AddressBoook;and here is the implementation class for the same.
import it.gogs.sphinx.SearchCriteria;
import it.gogs.sphinx.api.SphinxException;
import it.gogs.sphinx.exception.AddressBookBizException;
import it.gogs.sphinx.exception.AddressBookTechnicalException;
import java.util.List;
/**
*
* @author Munish Gogna
*
*/
public interface AddressBook {
/**
* Returns the list of AddressBook objects based on search criteria.
*
* @param criteria
* @throws AddressBookTechnicalException
* @throws AddressBookBizException
* @throws SphinxException
*/
public List getAddressBookList(SearchCriteria criteria)
throws AddressBookTechnicalException, AddressBookBizException,
SphinxException;
}
package it.gogs.sphinx.addressbook.impl;
import java.util.List;
import it.gogs.sphinx.AddressBoook;
import it.gogs.sphinx.SearchCriteria;
import it.gogs.sphinx.addressbook.AddressBook;
import it.gogs.sphinx.api.SphinxException;
import it.gogs.sphinx.exception.AddressBookBizException;
import it.gogs.sphinx.exception.AddressBookTechnicalException;
import it.gogs.sphinx.util.AddressBookAdapter;
import it.gogs.sphinx.util.SphinxInstance;
/**
* Implementation for our Address Book example
*
* @author Munish Gogna
*
*/
public class AddressBookImpl implements AddressBook{
public List getAddressBookList(SearchCriteria criteria)
throws AddressBookTechnicalException, AddressBookBizException,
SphinxException {
List addressIds= SphinxInstance.getAddressBookIds(criteria);
return AddressBookAdapter.getAddressBookList(addressIds);
}
}
ok so far so good, let's run some tests now ............
package it.gogs.sphinx.test;
import java.util.Calendar;
import java.util.GregorianCalendar;
import java.util.List;
import it.gogs.sphinx.AddressBoook;
import it.gogs.sphinx.DateRange;
import it.gogs.sphinx.SearchCriteria;
import it.gogs.sphinx.addressbook.AddressBook;
import it.gogs.sphinx.addressbook.impl.AddressBookImpl;
import it.gogs.sphinx.api.SphinxException;
import it.gogs.sphinx.exception.AddressBookBizException;
import it.gogs.sphinx.exception.AddressBookTechnicalException;
import junit.framework.TestCase;
/**
*
* @author Munish Gogna
*
*/
public class AddressBookTest extends TestCase {
private AddressBook addressBook;
@Override
protected void setUp() throws Exception {
super.setUp();
addressBook = new AddressBookImpl();
}
@Override
protected void tearDown() throws Exception {
super.tearDown();
}
/** this should be a unique record for Himanshu */
public void test_search_for_himanshu() throws Exception {
SearchCriteria criteria = new SearchCriteria();
// remember the first 'search' example??
criteria.setKeywords("u4732");
List addressList = addressBook.getAddressBookList(criteria);
assertTrue(addressList.size() == 1);
assertTrue("expecting himanshu here", "himanshu".equals(addressList.get(0).getfName()));
}
/** only two employees have name gurleen or toshi */
public void test_search_for_gurleen_or_toshi() throws Exception {
SearchCriteria criteria = new SearchCriteria();
// remember the second 'search' example??
criteria.setKeywords("gurleen toshi");
List addressList = addressBook.getAddressBookList(criteria);
assertTrue(addressList.size() == 2);
assertTrue("expecting toshi here", "toshi".equals(addressList.get(0).getfName()));
assertTrue("expecting gurleen here", "gurleen".equals(addressList.get(1).getfName()));
}
/** there are 16 people from jammu location */
public void test_search_for_people_from_jammu_location() throws Exception {
SearchCriteria criteria = new SearchCriteria();
criteria.setKeywords("jammu");
List addressList = addressBook.getAddressBookList(criteria);
assertTrue(addressList.size() == 16);
}
/** only Aalap, Manda and nitika are having title as Mrs and joined in 2010 */
public void test_joined_in_2010_with_title_Mrs() throws Exception {
DateRange dateRange = new DateRange();
GregorianCalendar calendar1 = new GregorianCalendar();
calendar1.set(Calendar.YEAR, 2010);
calendar1.set(Calendar.MONTH, Calendar.JANUARY);
calendar1.set(Calendar.DAY_OF_MONTH, 1);
dateRange.setFromDate(calendar1.getTime());
GregorianCalendar calendar2 = new GregorianCalendar();
calendar2.set(Calendar.YEAR, 2010);
calendar2.set(Calendar.MONTH, Calendar.DECEMBER);
calendar2.set(Calendar.DAY_OF_MONTH, 31);
dateRange.setToDate(calendar2.getTime());
SearchCriteria criteria = new SearchCriteria();
criteria.setKeywords("Mrs");
criteria.setDateRage(dateRange);
List addressList = addressBook.getAddressBookList(criteria);
assertTrue("expecting 3 records here", addressList.size() == 3);
}
/** should get a business exception here */
public void test_without_specifying_keywords(){
SearchCriteria criteria = new SearchCriteria();
//criteria.setKeywords("Mrs");
try {
addressBook.getAddressBookList(criteria);
} catch (Exception e) {
assertTrue(e instanceof AddressBookBizException);
assertTrue(e.getMessage().indexOf("keyword is a required field") >-1);
}
}
}
How we update the Index once database changes?
For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes.
Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below.
C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --rotate
Capsule
We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data.
Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :)
- Munish Gogna
Database
Java (programming language)
Data (computing)
sql
MySQL
application
Document
Address book
Web Service
Opinions expressed by DZone contributors are their own.
Comments