DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Doris vs Elasticsearch: A Comparison and Practical Cost Case Study
  • Designing a Blog Application Using Document Databases
  • Relational DB Migration to S3 Data Lake Via AWS DMS, Part I
  • NoSQL for Relational Minds

Trending

  • Artificial Intelligence, Real Consequences: Balancing Good vs Evil AI [Infographic]
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Keep Elasticsearch in Sync with Relational Databases?

How to Keep Elasticsearch in Sync with Relational Databases?

Hibernate Search is a library that allows keeping your local Lucene indexes or ElasticSearch cluster in sync with your database

By 
Hüseyin Akdoğan user avatar
Hüseyin Akdoğan
DZone Core CORE ·
Dec. 08, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
9.0K Views

Join the DZone community and get the full member experience.

Join For Free

This article was published in Java Advent Calendar on December 6, 2020

Many businesses are looking to take advantage of Elasticsearch’s powerful search capabilities using it in close relationship with existing relational databases. In this context, it’s not rare to use Elasticsearch as a caching layer. At this point, a basic and important need arises which is synchronizing Elasticsearch with the database.

Roughly, the steps below are followed for synchronization:

  • A field is added that contains the update or insertion time to the table that will be kept synchronized with Elasticsearch
  • A field is added that contains a boolean for marking record deletion to the table that will be kept synchronized with Elasticsearch
  • Both the two fields are used in a query that is periodically executed on the table by a scheduler to request the only records that have been modified, inserted, or deleted since the last execution of the scheduler
  • If there are newly added, updated, and deleted records, the business logic is invoked to perform CRUD operations both on Elasticsearch and Database(when there are records deleted)
  • Scheduler runtime is stored for use in the next execution period

This pattern has some assumptions and disadvantages. Firstly, it has an estimate of how often the database is updated and runs the scheduler accordingly. The Database may be updating more frequently than assumed. In this case, users are likely to view stale data. When it’s the opposite we waste resources because one of the main purposes of using the cache layer is to reduce I/O operations on the database.

Another additional overhead for situations where you’re not returning results from the cache is that database queries should be written to exclude records marked as “deleted“.

Hibernate Search

Hibernate Search is a library that allows keeping your local Apache Lucene indexes or ElasticSearch cluster in sync with your data that extracts from Hibernate ORM based on your domain model. You can get this ability for your application by a few settings and annotations.

Base Components

Hibernate Search is based on two key components. Since these key components are directly related to the efficient use of the library, let’s take a closer look at them now.

Mapper

The mapper component maps your entities to a Lucene index and provides some APIs to perform indexing and searching. The mapper is configured both through annotations on the entities and through configuration properties which are key-value based.

Backend

The backend is the abstraction over the full-text engines. It implements generic indexing and searching interfaces for use by the mapper and delegates to the engine you chose to use in your application for instance Lucene library or a remote Elasticsearch cluster. The mapper configures the backend partly by telling which indexes must exist and what fields they must have. In addition, the backend is configured partly also through configuration properties.

For providing the following main features, the mapper and backend work together.

  • Mass indexing to import data from a database
  • Automatic indexing to keeping indexes in sync with a database
  • Searching to query an index

Dependencies

In order to use Hibernate Search, you will need at least two direct dependencies. One of these dependencies is related to the mapper component.

XML
 




xxxxxxxxxx
1


 
1
<dependency>
2
   <groupId>org.hibernate.search</groupId>
3
   <artifactId>hibernate-search-mapper-orm</artifactId>
4
   <version>6.0.0.CR2</version>
5
</dependency>



The other one is related to the backend component and depends on your single or multiple node choice. For Lucene:

XML
 




xxxxxxxxxx
1


 
1
<dependency>
2
   <groupId>org.hibernate.search</groupId>
3
   <artifactId>hibernate-search-backend-lucene</artifactId>
4
   <version>6.0.0.CR2</version>
5
</dependency>



The Lucene backend allows indexing of the entities in a single node and storing these indexes on the local filesystem. The indexes are accessed through direct calls to the Lucene library, without going through the network. Hence, Lucene backend is only relevant to single-node applications. So if you have a single-node application you can prefer the Lucene backend.

For Elasticsearch:

XML
 




xxxxxxxxxx
1


 
1
<dependency>
2
   <groupId>org.hibernate.search</groupId>
3
   <artifactId>hibernate-search-backend-elasticsearch</artifactId>
4
   <version>6.0.0.CR2</version>
5
</dependency>



The Elasticsearch backend allows indexing of the entities on multiple nodes and storing these indexes on a remote Elasticsearch cluster. These indexes are not tied to the application, therefore, accessed through calls to REST APIs.

Note that you can use both Lucene and Elasticsearch backends at the same time.

Configuration

The configuration properties of Hibernate Search can be added to any file from which Hibernate ORM takes its configuration because they are sourced from Hibernate ORM.

These files can be:

  • Hibernate.properties
  • Hibernate.cfg.xml
  • Persistence.xml

In addition to these files, application properties files of Java runtimes such as Quarkus and Spring can also be used for configuration when you use them.

Hibernate Search provides sensible defaults for all configuration properties but there are few basic configuration parameters that you cannot avoid explicitly setting for your application in some cases.

hibernate.search.backend.directory.root This setting is about where indexes will be stored in the file system. It works when you use the Lucene backend. It will store indexes in the current working directory by default

  • hibernate.search.backend.hosts This setting is about defining the Elasticsearch host URL, so it works when you use the Elasticsearch backend. By default, the backend will attempt to connect to localhost:9200
  • hibernate.search.backend.protocol This setting is about defining the protocol. You use this setting explicitly when you need to use https because its default value is http
  • hibernate.search.backend.username and hibernate.search.backend.password These settings are about defining the username and password for basic HTTP authentication
  • hibernate.search.backend.analysis.configurer This setting is about defining a bean reference pointing to the analyzer implementation. You use this setting when you need to custom analysis

Coding Time

Let’s assume that JUG Istanbul uses a meetup app for meetings organized by itself and the data is stored in a relational database. Their domain models contain an event and host entity.

Adding a few settings to the application and a few annotations to the entities will be sufficient to take advantage of Elasticsearch’s powerful search capabilities via Hibernate Search. The entities are seen as follows.

Note: As the reader is assumed to be familiar with the basic concepts of Elasticsearch, these concepts will not be explained in detail.

Java
 




xxxxxxxxxx
1
21


 
1
@Entity
2
@Indexed //(1)
3
public class Host
4
{
5
    @Id
6
    @GeneratedValue
7
    @GenericField //(2)
8
    private int id;
9

          
10
    @KeywordField //(3)
11
    private String firstname;
12

          
13
    @KeywordField
14
    private String lastname;
15

          
16
    @FullTextField(analyzer = "english") //(4)
17
    private String title;
18

          
19
    @OneToMany(mappedBy = "host", cascade = CascadeType.ALL, orphanRemoval = true, fetch = FetchType.LAZY)
20
    @IndexedEmbedded //(5)
21
    private List<Event> events;



1) @Indexed annotation registers the Host entity for indexing by the full-text search engine i.e Elasticsearch.

2) @GenericField annotation maps the id field to an index field.

3) @KeywordField annotation maps the firstname and lastname fields as a non-analyzed index field, which means that the fields are not tokenized.

4) @FullTextField annotation maps the title field as a specifically full-text search field to an index field. In addition, it defines an analyzer named “english” to gain capabilities like make matches implicitly on words (“tokens“) instead of the full string and return documents consultant while searching for consultation by tokenizing and filtering the string.

5) @IndexedEmbedded annotation includes the associated Event entities into the Host index. The main benefit of this annotation is that it can automatically re-index Host if one of its events is updated, thanks to the bidirectional relation.

Java
 




xxxxxxxxxx
1
16


 
1
@Entity
2
@Indexed
3
public class Event
4
{
5
    @Id
6
    @GeneratedValue
7
    @GenericField
8
    private int id;
9

          
10
    @FullTextField(analyzer = "english")
11
    private String name;
12

          
13
    @ManyToOne
14
    @JsonIgnore
15
    private Host host;
16
}



These are our entities, so let’s look at how to perform search and CRUD operations on these entities.

Java
 




xxxxxxxxxx
1
46


 
1

          
2
    @GetMapping(path = "/search/hosts", produces = "application/json")
3
    public List<Host> allHosts(){
4
        SearchResult<Host> result = searchSession.search(Host.class)
5
                .where( f -> f.matchAll())
6
                .fetch(20);
7

          
8
        logger.info("Hit count is {}", result.total().hitCount());
9
        return result.hits();
10
    }
11

          
12
    @Transactional
13
    @PostMapping(path = "/event/add", consumes = "application/json", produces = "application/json")
14
    public Event addEvent(@RequestBody Event event){
15
        entityManager.persist(event);
16
        return event;
17
    }
18

          
19
    @Transactional
20
    @PostMapping(path = "/event/update", consumes = "application/json", produces = "application/json")
21
    public Event updateEvent(@RequestBody Event event){
22
        entityManager.merge(event);
23
        return event;
24
    }
25

          
26
    @Transactional
27
    @PostMapping(path = "/host/add", consumes = "application/json", produces = "application/json")
28
    public Host addHost(@RequestBody Host host){
29
        entityManager.persist(host);
30
        return host;
31
    }
32

          
33
    @Transactional
34
    @PostMapping(path = "/host/update", consumes = "application/json", produces = "application/json")
35
    public Host updateHost(@RequestBody Host host){
36
        entityManager.merge(host);
37
        return host;
38
    }
39

          
40
    @Transactional
41
    @DeleteMapping(path = "/event/delete/{id}", produces = "text/plain")
42
    public String deleteEventById(@PathVariable("id") int id){
43
        Event event = entityManager.find(Event.class, id);
44
        entityManager.remove(event);
45
        return String.join(" : ", "Removed", event.toString());
46
    }



When you look at the addEvent, updateEvent, and deleteEvent methods, you won’t notice any difference from the standard JPA usage. The difference is seen in methods such as searchEventsByName and searchHostsByName. In these methods, indices are queried over the Hibernate Search session which is obtained from the injected entity manager by setting the “WHERE” clause.

In this GitHub repository that shows you how to use Hibernate Search with Spring and Quarkus Java runtimes, you can find other details of this example such as configuration, mass indexing, and custom analyzer usage.

Conclusion

Today, the use of full-text search engines such as Elasticsearch as a cache is widespread. In that case, it is essential to keep Elasticsearch synchronized with the database. Hibernate Search meets this requirement elegantly. It indexes your domain model with the help of a few annotations and keeps your local Apache Lucene indexes or ElasticSearch cluster in sync with your data that extracts from Hibernate ORM based on your domain model. While it provides these facilities, it does not distract the developer from the familiar syntax.

References

Hibernate Search Reference

Database Relational database Elasticsearch Sync (Unix)

Published at DZone with permission of Hüseyin Akdoğan. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Doris vs Elasticsearch: A Comparison and Practical Cost Case Study
  • Designing a Blog Application Using Document Databases
  • Relational DB Migration to S3 Data Lake Via AWS DMS, Part I
  • NoSQL for Relational Minds

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!