Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

JPA Searching Using Lucene - A Working Example with Spring and DBUnit

DZone's Guide to

JPA Searching Using Lucene - A Working Example with Spring and DBUnit

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

Working Example on Github


There's a small, self contained mavenised example project over on Github to accompany this post - check it out here:https://github.com/adrianmilne/jpa-lucene-spring-demo


Running the Demo


See the README file over on GitHub for details of running the demo. Essentially - it's just running the Unit Tests, with the usual maven build and test results output to the console - example below. This is the result of running the DBUnit test, which inserts Book data into the HSQL database using JPA, and then uses Lucene to query the data, testing that the expected Books are returned (i.e. only those int he SCI-FI category, containing the word 'Space', and ensuring that any with 'Space' in the title appear before those with 'Space' only in the description.



The Book Entity


Our simple example stores Books. The Book entity class below is a standard JPA Entity with a few additional annotations to identify it to Lucene:

@Indexed - this identifies that the class will be added to the Lucene index. You can define a specific index by adding the 'index' attribute to the annotation. We're just choosing the simplest, minimal configuration for this example. 

In addition to this - you also need to specify which properties on the entity are to be indexed, and how they are to be indexed. For our example we are again going for the default option by just adding an @Field annotation with no extra parameters. We are adding one other annotation to the 'title' field - @Boost - this is just telling Lucene to give more weight to search term matches that appear in this field (than the same term appearing in the description field). 

This example is purposefully kept minimal in terms of the ins-and-outs of Lucene (I may cover that in a later post) - we're really just concentrating on the integration with JPA and Spring for now.

package com.cor.demo.jpa.entity;

import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Lob;

import org.hibernate.search.annotations.Boost;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Indexed;

/**
* Book JPA Entity.
*/
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Long id;

    @Field
    @Boost(value = 1.5f)
    private String title;

    @Field
    @Lob
    private String description;

    @Field
    @Enumerated(EnumType.STRING)
    private BookCategory category;

    public Book(){

    }

    public Book(String title, BookCategory category, String description){
        this.title = title;
        this.category = category;
        this.description = description;
    }

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public BookCategory getCategory() {
        return category;
    }

    public void setCategory(BookCategory category) {
        this.category = category;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    @Override
    public String toString() {
        return "Book [id=" + id + ", title=" + title + ", description=" + description + ", category=" + category + "]";
    }

}

The Book Manager


The BookManager class acts as a simple service layer for the Book operations - used for adding books and searching books. As you can see, the JPA database resources are autowired in by Spring from the application-context.xml. We are just using an in-memory hsql database in this example. 

package com.cor.demo.jpa.manager;

import java.util.List;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.PersistenceContextType;
import javax.persistence.Query;

import org.hibernate.search.jpa.FullTextEntityManager;
import org.hibernate.search.jpa.Search;
import org.hibernate.search.query.dsl.QueryBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;

import com.cor.demo.jpa.entity.Book;
import com.cor.demo.jpa.entity.BookCategory;

/**
 * Manager for persisting and searching on Books. Uses JPA and Lucene.
 */
@Component
@Scope(value = "singleton")
public class BookManager {

    /** Logger. */
    private static Logger LOG = LoggerFactory.getLogger(BookManager.class);

    /** JPA Persistence Unit. */
    @PersistenceContext(type = PersistenceContextType.EXTENDED, name = "booksPU")
    private EntityManager em;

    /** Hibernate Full Text Entity Manager. */
    private FullTextEntityManager ftem;

    /**
     * Method to manually update the Full Text Index. This is not required if inserting entities
     * using this Manager as they will automatically be indexed. Useful though if you need to index
     * data inserted using a different method (e.g. pre-existing data, or test data inserted via
     * scripts or DbUnit).
     */
    public void updateFullTextIndex() throws Exception {
        LOG.info("Updating Index");
        getFullTextEntityManager().createIndexer().startAndWait();
    }

    /**
     * Add a Book to the Database.
     */
    @Transactional
    public Book addBook(Book book) {
        LOG.info("Adding Book : " + book);
        em.persist(book);
        return book;
    }

    /**
     * Delete All Books.
     */
    @SuppressWarnings("unchecked")
    @Transactional
    public void deleteAllBooks() {

        LOG.info("Delete All Books");

        Query allBooks = em.createQuery("select b from Book b");
        List<Book> books = allBooks.getResultList();

        // We need to delete individually (rather than a bulk delete) to ensure they are removed
        // from the Lucene index correctly
        for (Book b : books) {
            em.remove(b);
        }

    }

    @SuppressWarnings("unchecked")
    @Transactional
    public void listAllBooks() {

        LOG.info("List All Books");
        LOG.info("------------------------------------------");

        Query allBooks = em.createQuery("select b from Book b");
        List<Book> books = allBooks.getResultList();

        for (Book b : books) {
            LOG.info(b.toString());
            getFullTextEntityManager().index(b);
        }

    }

    /**
     * Search for a Book.
     */
    @SuppressWarnings("unchecked")
    @Transactional
    public List<Book> search(BookCategory category, String searchString) {

        LOG.info("------------------------------------------");
        LOG.info("Searching Books in category '" + category + "' for phrase '" + searchString + "'");

        // Create a Query Builder
        QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Book.class).get();

        // Create a Lucene Full Text Query
        org.apache.lucene.search.Query luceneQuery = qb.bool()
                .must(qb.keyword().onFields("title", "description").matching(searchString).createQuery())
                .must(qb.keyword().onField("category").matching(category).createQuery()).createQuery();

        Query fullTextQuery = getFullTextEntityManager().createFullTextQuery(luceneQuery, Book.class);

        // Run Query and print out results to console
        List<Book> result = (List<Book>) fullTextQuery.getResultList();

        // Log the Results
        LOG.info("Found Matching Books :" + result.size());
        for (Book b : result) {
            LOG.info(" - " + b);
        }

        return result;
    }

    /**
     * Convenience method to get Full Test Entity Manager. Protected scope to assist mocking in Unit
     * Tests.
     * @return Full Text Entity Manager.
     */
    protected FullTextEntityManager getFullTextEntityManager() {
        if (ftem == null) {
            ftem = Search.getFullTextEntityManager(em);
        }
        return ftem;
    }

    /**
     * Get the JPA Entity Manager (required for the DBUnit Tests).
     * @return Entity manager
     */
    protected EntityManager getEntityManager() {
        return em;
    }

    /**
     * Sets the JPA Entity Manager (required to assist with mocking in Unit Test)
     * @param em EntityManager
     */
    protected void setEntityManager(EntityManager em) {
        this.em = em;
    }

}

application-context.xml


This is the Spring configuration file. You can see in the JPA Entity Manager configuration the key for 'hibernate.search.default.indexBase' is added to the jpaPropertyMap to tell Lucene where to create the index. We have also externalised the database login credentials to a properties file (as you may wish to change these for different environments), for example by updating the propertyConfigurer to look for and use a different external properties if it finds one on the file system). 

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:p="http://www.springframework.org/schema/p"
xmlns:camel="http://camel.apache.org/schema/spring" xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/txhttp://www.springframework.org/schema/tx/spring-tx.xsd
http://www.springframework.org/schema/contexthttp://www.springframework.org/schema/context/spring-context.xsd">

<!-- Spring Component Package Scan -->
<context:component-scan base-package="com.cor.demo.jpa" />

<!-- Property configuration -->
<bean id="propertyConfigurer"
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"
p:ignoreUnresolvablePlaceholders="true" p:ignoreResourceNotFound="true">
<property name="locations">
<list>
<value>classpath:/system.properties</value>
</list>
</property>
</bean>

<!-- JPA Entity Manager Factory -->
<bean id="entityManagerFactory"
class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource" />
<!-- <property name="packagesToScan" value="com.cor.demo.jpa.entity" /> -->
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
<property name="showSql" value="true" />
<property name="generateDdl" value="true" />
</bean>
</property>
<property name="jpaPropertyMap">
<map>
<entry key="hibernate.hbm2ddl.auto" value="update" />
<entry key="hibernate.format_sql" value="true" />
<entry key="hibernate.use_sql_comments" value="false" />
<entry key="hibernate.show_sql" value="false" />
<entry key="hibernate.search.default.indexBase" value="/var/lucene/indexes" />
</map>
</property>
</bean>

<!-- JPA Data Source -->
<bean id="dataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="driverClassName" value="${database.driver}" />
<property name="url" value="${database.url}" />
<property name="username" value="${database.username}" />
<property name="password" value="${database.password}" />
</bean>

<!-- Transaction Manager -->
<bean id="txManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>
<tx:annotation-driven transaction-manager="txManager" />

</beans>

Testing Using DBUnit


In the project is an example of using DBUnit with Spring to test adding and searching against the database using DBUnit to populate the database with test data, exercise the Book Manager search operations and then clean the database down. This is a great way to test database functionality and can be easily integrated into maven and continuous build environments.

Because DBUnit bypasses the standard JPA insertion calls - the data does not get automatically added to the Lucene index. We have a method exposed on the service interface to update the Full Text index 'updateFullTextIndex()' - calling this causes Lucene to update the index with the current data in the database. This can be useful when you are adding search to pre-populated databases to index the  existing content.

package com.cor.demo.jpa.manager;

import java.io.InputStream;
import java.util.List;

import org.dbunit.DBTestCase;
import org.dbunit.database.DatabaseConnection;
import org.dbunit.database.IDatabaseConnection;
import org.dbunit.dataset.IDataSet;
import org.dbunit.dataset.xml.FlatXmlDataSetBuilder;
import org.dbunit.operation.DatabaseOperation;
import org.hibernate.impl.SessionImpl;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

import com.cor.demo.jpa.entity.Book;
import com.cor.demo.jpa.entity.BookCategory;

/**
 * DBUnit Test - loads data defined in 'test-data-set.xml' into the database to run tests against the
 * BookManager. More thorough (and ultimately easier in this context) than using mocks.
 */
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = { "classpath:/application-context.xml" })
public class BookManagerDBUnitTest extends DBTestCase {

    /** Logger. */
    private static Logger LOG = LoggerFactory.getLogger(BookManagerDBUnitTest.class);

    /** Book Manager Under Test. */
    @Autowired
    private BookManager bookManager;

    @Before
    public void setup() throws Exception {
        DatabaseOperation.CLEAN_INSERT.execute(getDatabaseConnection(), getDataSet());
    }

    @After
    public void tearDown() {
        deleteBooks();
    }

    @Override
    protected IDataSet getDataSet() throws Exception {
        InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("test-data-set.xml");
        FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder();
        return builder.build(inputStream);
    }

    /**
     * Get the underlying database connection from the JPA Entity Manager (DBUnit needs this connection).
     * @return Database Connection
     * @throws Exception
     */
    private IDatabaseConnection getDatabaseConnection() throws Exception {
        return new DatabaseConnection(((SessionImpl) (bookManager.getEntityManager().getDelegate())).connection());
    }

    /**
     * Tests the expected results for searching for 'Space' in SCF-FI books.
     */
    @Test
    public void testSciFiBookSearch() throws Exception {

        bookManager.listAllBooks();
        bookManager.updateFullTextIndex();
        List<Book> results = bookManager.search(BookCategory.SCIFI, "Space");

        assertEquals("Expected 2 results for SCI FI search for 'Space'", 2, results.size());
        assertEquals("Expected 1st result to be '2001: A Space Oddysey'", "2001: A Space Oddysey", results.get(0).getTitle());
        assertEquals("Expected 2nd result to be 'Apollo 13'", "Apollo 13", results.get(1).getTitle());
    }

    private void deleteBooks() {
        LOG.info("Deleting Books...-");
        bookManager.deleteAllBooks();
    }

}

The source data for the test is defined in an xml file.

<?xml version='1.0' encoding='UTF-8'?>
<!-- Test Dataset - mix of FANTASY and SC-FI to suppor the BookManagerDBUnitTest -->
<dataset>
  <book id="1" title="The Lord of the Rings"
description="the Lord of the Rings is an epic high fantasy novel written by English philologist and University of Oxford professor J. R. R. Tolkien"
category="FANTASY" />
<book id="2" title="The War of the Worlds" description="War in space"
category="FANTASY" />
<book id="3" title="Apollo 13"
description="Apollo 13 was the seventh manned mission in the American Apollo space program and the third intended to land on the Moon"
category="SCIFI" />
<book id="4" title="2001: A Space Oddysey"
description="2001: A Space Odyssey is a 1968 British-American science fiction film produced and directed by Stanley Kubrick"
category="SCIFI" />
<book id="5" title="Dune"
description="Dune is a 1984 science fiction film written and directed by David Lynch, based on the 1965 Frank Herbert novel of the same name."
category="SCIFI" />
</dataset>




The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
architects ,bigdata ,tool ,lucene ,tools & methods ,big data

Published at DZone with permission of Adrian Milne, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}