DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. Transparent Indexing by Hibernate Search

Transparent Indexing by Hibernate Search

Andrew C. Oliver user avatar by
Andrew C. Oliver
·
Nov. 01, 12 · Interview
Like (0)
Save
Tweet
Share
14.77K Views

Join the DZone community and get the full member experience.

Join For Free

Curator's Note: This article was written by Andrew Ball.  He is a developer for OSI (Open Software Integrators). You can also check out their blog here. 

Consider the now infamous Granny’s Addressbook application, the one that my company uses to teach and vet new technologies.  It has a simple UI with only a few fields.  You can find our SpringMVC/RDBMS version here and our NodeJS/MongoDB version with AJAX and Appcelerator example front end here.  The application might have a screen that looks like this:


So simple that even Granny could use it, right?  But what about the code behind it if you’re using an RDBMS?  One “dumb” way to write this is as follows:

public List<Address> searchAddresses(String name, boolean nameExact,
  String address, boolean addressExact,
  String email, boolean emailExact,
  String phone, boolean phoneExact) {

  boolean addedOneCondition = false;
  StringBuilder sb = new StringBuilder();
  sb.append("SELECT a FROM Address a");

  if (name != null || address != null || email != null ||
  phone != null ) {
  sb.append(" WHERE ");
  }

  if (name != null) {
  addedOneCondition = true;
  if (nameExact) {
  sb.append(" a.name = :name");
  } else {
  sb.append(" LOWER(a.name) LIKE :name");
  }
  }

  if (address != null) {
  if (addedOneCondition) {
  sb.append(" OR ");
  } else {
  addedOneCondition = true;
  }

  if (addressExact) {
  sb.append(" a.address = :address ");
  } else {
  sb.append(" LOWER(a.address) LIKE :address");
  }
  }

  if (email != null) {
  if (addedOneCondition) {
  sb.append(" OR ");
  } else {
  addedOneCondition = true;
  }

  if (emailExact) {
  sb.append(" a.email = :email");
  } else {
  sb.append(" LOWER(a.email) LIKE :email");
  }
  }

  if (phone != null) {
  if (addedOneCondition) {
  sb.append(" OR ");
  } else {
  addedOneCondition = true;
  }

  if (phoneExact) {
  sb.append(" a.phone = :phone");
  } else {
  sb.append(" LOWER(a.phone) LIKE :phone");
  }
  }

  Query q = em.createQuery(sb.toString(), Address.class);
  if (name != null) {
  if (nameExact) {
  q.setParameter("name", name);
  } else {
  q.setParameter("name", "%" + name + "%");
  }
  }
  if (address != null) {
  if (addressExact) {
  q.setParameter("address", address);
  } else {
  q.setParameter("address", "%" + address + "%");
  }
  }
  if (email != null) {
  if (emailExact) {
  q.setParameter("email", email);
  } else {
  q.setParameter("email", "%" + email + "%");
  }
  }
  if (phone != null) {
  if (phoneExact) {
  q.setParameter("phone", phone);
  } else {
  q.setParameter("phone", "%" + phone + "%");
  }
  }

  return q.getResultList();
}



This performs terribly, even if you add an index for every column and combination of columns on which you could possibly search (an approach that will likely make any reasonably competent DBA upset, as it would destroy write performance.)

A worst-case SQL query generated from the above code would resemble the following:

SELECT * FROM ADDRESS WHERE
LOWER("name") LIKE '%sue%' OR
LOWER("address") LIKE '%Morgan St.%' OR
LOWER("phone") LIKE '%555.555.5555%' OR
LOWER("email") LIKE '%sue.snodgrass@gmail.com%';



Note that there are no indexes for anything but the “id” column. With PostgreSQL, an “EXPLAIN ANALYZE VERBOSE” on the above query shows that every single row of the table would be scanned to execute this query, checking for matches of each pattern:

Seq Scan on public.address  (cost=0.00..10.60 rows=1 width=2072) (actual time=0.049..0.052 rows=1 loops=1)
  Output: id, address, email, name, phone
  Filter: (((address.name)::text ~~* '%sue%'::text) OR ((address.address)::text ~~* '%Morgan St.%'::text) OR ((address.phone)::text ~~* '%555.555.5555%'::text) OR ((address.email)::text ~~* '%sue.snodgrass@gmail.com%'::text))
 Total runtime: 0.099 ms



But that doesn’t even begin to scratch the surface for issues like variations in phone number formats, nicknames (Did I enter “Sue” or “Susan”?), etc. Why can’t I just let Google search the data for me? Well, with Hibernate Search you can achieve something quite similar, with all open source tools and minimal effort. Hibernate Search is based on the much-acclaimed Apache Lucene project, which is very adept at indexing data for full-text searches, including automatically breaking words apart into root words and their inflections (“stemming”) and allowing for synonym lists.

So, how do we go about getting Hibernate Search to enable full-text search for our example entity? The first step is to add the necessary JBoss repositories to our Maven pom.xml file if they aren’t there already:

<repositories>
  <!-- ... -->
  <repository>
  <id>jboss-public-repository-group</id>
  <name>JBoss Public Maven Repository Group</name>
  <url>https://repository.jboss.org/nexus/content/groups/public-jboss/</url>
  <layout>default</layout>
  <releases>


  <enabled>true</enabled>
  <updatePolicy>never</updatePolicy>
  </releases>
  <snapshots>
  <enabled>true</enabled>
  <updatePolicy>never</updatePolicy>
  </snapshots>
  </repository>
</repositories>
<pluginRepositories>
  <!-- ... ->
  <pluginRepository>
  <id>jboss-public-repository-group</id>
  <name>JBoss Public Maven Repository Group</name>
  <url>https://repository.jboss.org/nexus/content/groups/public-jboss/</url>
  <layout>default</layout>
  <releases>
  <enabled>true</enabled>
  <updatePolicy>never</updatePolicy>
  </releases>
  <snapshots>
  <enabled>true</enabled>
  <updatePolicy>never</updatePolicy>
  </snapshots>
  </pluginRepository>
</pluginRepositories>



Then we can take our JPA-annotated entity and add a few annotations (noted in bold below):

@Entity
@NamedQueries(
  {@NamedQuery(name="Address.findAll",
  query="select a from Address a"),
  @NamedQuery(name="Address.findByName",
  query="select a from Address a where a.name = ?1")})
@Indexed
@AnalyzerDef(name = "customanalyzer",
  tokenizer = @TokenizerDef(factory =
  StandardTokenizerFactory.class),
  filters = {
  @TokenFilterDef(factory = LowerCaseFilterFactory.class),
  @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
  @Parameter(name = "language", value = "English")
  })
  })
public class Address {
  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  private Long id;

  @Field(index=Index.TOKENIZED, store=Store.NO)
  private String name;
  @Field(index=Index.TOKENIZED, store=Store.NO)
  private String email;
  @Field(index=Index.TOKENIZED, store=Store.NO)
  private String phone;
  @Field(index=Index.TOKENIZED, store=Store.NO)
  private String address;

  /* . . . */
}



Most of these annotations are fairly straightforward to understand. @Indexed indicates that we want Hibernate Search to manage indexes for this entity. @Field indicates that a particular property is to be indexed. We can specify that we want indexed fields to be tokenized (that is, split into parts, usually words) when indexed or treated as a single token. This means that “Abe” will match “Abe Lincoln” without having to specify that we want to allow extra characters with a  search pattern such as “Abe*”.

The more interesting annotations also have to do with some of the more interesting functionality that Lucene (and by extension) Hibernate Search provides. The @AnalyzerDef gives some extra directions on the kind of processing that we want to happen to the content before indexing takes place. For example, the LowerCaseFiterFactory.class token filter will convert all text to lowercase before indexing occurs. The Snowball-Porter filter factory does stemming of tokens before they are indexed -- that is, root words (“stems”) are extracted, so “hiking”, “hiker”, and “hikers” would all get indexed as “hike”.

After adding a few properties to the JPA META-INF/persistence.xml to tell Hibernate Search where to store the Lucene indexes, we can write a totally different search method as follows:

public List<Address> fullTextSearch(String stringToMatch) {
  FullTextEntityManager ftem = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);

  // build up a Lucene query
  QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder()
  .forEntity(Address.class).get();
  org.apache.lucene.search.Query luceneQuery = qb
  .keyword()
  .onFields("name", "address", "phone", "email")
  .matching(stringToMatch.toLowerCase())
  .createQuery();

  // wrap the Lucene query in a JPA query
  Query jpaWrappedQuery = ftem.createFullTextQuery(luceneQuery,
  Address.class);

  return jpaWrappedQuery.getResultList();
}



All of this indexing is done transparently by Hibernate Search as entities are persisted, updated, and removed. The performance is light-years ahead of the linear scans of tables done by a relational database. Not to mention, the complete feature set of Apache Lucene is available. What’s not to like? The Hibernate Search project has very good documentation (indeed, much of this implementation comes from that documentation). A simple implementation is on Andrew Ball’s copy of the SpringGrannyMVC project on github at https://github.com/cortextual/OSIL (please use the “search” branch). Happy searching!


Database Relational database Hibernate

Published at DZone with permission of Andrew C. Oliver. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 10 Easy Steps To Start Using Git and GitHub
  • Cucumber.js Tutorial With Examples For Selenium JavaScript
  • GitLab vs Jenkins: Which Is the Best CI/CD Tool?
  • Practical Example of Using CSS Layer

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: