DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Modify JSON Data in Postgres and Hibernate 6
  • Postgres JSON Functions With Hibernate 6
  • Postgres JSON Functions With Hibernate 5
  • Introduction to Elasticsearch

Trending

  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • A Deep Dive Into Firmware Over the Air for IoT Devices
  • MySQL to PostgreSQL Database Migration: A Practical Case Study
  1. DZone
  2. Data Engineering
  3. Databases
  4. Postgres Full-Text Search With Hibernate 6

Postgres Full-Text Search With Hibernate 6

In this article, explore the posjsonhelper library and learn about adding support for the full-text search for your project that uses Hibernate 6 and Postgres.

By 
Szymon Tarnowski user avatar
Szymon Tarnowski
DZone Core CORE ·
Jan. 09, 24 · Tutorial
Likes (11)
Comment
Save
Tweet
Share
10.1K Views

Join the DZone community and get the full member experience.

Join For Free

Hibernate

Hibernate by itself does not have full-text search support. It has to rely on database engine support or third-party solutions.

An extension called Hibernate Search integrates with Apache Lucene or Elasticsearch (there is also integration with OpenSearch).

Postgres

Postgres has had full-text search functionality since version 7.3. Although it can not compete with search engines like Elasticsearch or Lucene, it still provides a flexible and robust solution that might be enough to meet application users' expectations—features like stemming, ranking, and indexing.

We will briefly explain how we can do a full-text search in Postgres. For more, please visit Postgres documentation. As for essential text matching, the most crucial part is the math operator @@.

It returns true if the document (object of type tsvector) matches the query (object of type tsquery).

The order is not crucial for the operator. So, it does not matter if we put the document on the left side of the operator and the query on the right side or in a different order.

For better demonstration, we use a database table called the tweet.

SQL
 
create table tweet (
        id bigint not null,
        short_content varchar(255),
        title varchar(255),
        primary key (id)
    )


With such data:

SQL
 
INSERT INTO tweet (id, title, short_content) VALUES (1, 'Cats', 'Cats rules the world');
INSERT INTO tweet (id, title, short_content) VALUES (2, 'Rats', 'Rats rules in the sewers');
INSERT INTO tweet (id, title, short_content) VALUES (3, 'Rats vs Cats', 'Rats and Cats hates each other');

INSERT INTO tweet (id, title, short_content) VALUES (4, 'Feature', 'This project is design to wrap already existed functions of Postgres');
INSERT INTO tweet (id, title, short_content) VALUES (5, 'Postgres database', 'Postgres is one of the widly used database on the market');
INSERT INTO tweet (id, title, short_content) VALUES (6, 'Database', 'On the market there is a lot of database that have similar features like Oracle');


Now let's see what the tsvector object looks like for the short_content column for each of the records.

SQL
 
SELECT id, to_tsvector('english', short_content) FROM tweet;


Output:

Output: to_tsvector

The output shows how to_tsvcector converts the text column to a tsvector object for the 'english' text search configuration.

Text Search Configuration

The first parameter for the to_tsvector function passed in the above example was the name of the text search configuration. In that case, it was the "english". According to Postgres documentation, the text search configuration is as follows:

... full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled by text search configurations.

So, configuration is a crucial part of the process and vital to our full-text search results. For different configurations, the Postgres engine can return different results. This does not have to be the case among dictionaries for different languages. For example, you can have two configurations for the same language, but one ignores names containing digits (for example, some serial numbers). If we pass in our query the specific serial number we are looking for, which is mandatory, we won't find any record for configuration that ignores words with numbers. Even if we have such records in the database, please check the configuration documentation for more information.

Text Query

Text query supports such operators as & (AND), | (OR), ! (NOT), and <-> (FOLLOWED BY). The first three operators do not require a deeper explanation. The <-> operator checks if words exist and if they are placed in a specific order. So, for example, for the query "rat <-> cat", we expect that the "cat" word is going to exist, followed by the "rat."

Examples

  • Content that contains the rat and cat:
SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Rat & cat');


Output: Rats and Cats hate each other

  • Content that contains database and market, and the market is the third word after database:
SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database <3> market');


Output: Postgres is one of the widely used databases on the market

  • Content that contains database but not Postgres:
SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database & !Postgres');


Output: On the market there is a lot of database that have similar features like Oracle

  • Content that contains Postgres or Oracle:
SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Postgres | Oracle');


Output: character varying

Wrapper Functions

One of the wrapper functions that creates text queries was already mentioned in this article, which is the to_tsquery. There are more such functions like:

  • plainto_tsquery
  • phraseto_tsquery
  • websearch_to_tsquery

plainto_tsquery

The plainto_tsquery converts all passed words to query where all words are combined with the & (AND) operator. For example, the equivalent of the plainto_tsquery('english', 'Rat cat') is to_tsquery('english', 'Rat & cat').

For the following usage:

SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ plainto_tsquery('english', 'Rat cat');


We get the result below:

Output from plainto_tsquery

phraseto_tsquery

The phraseto_tsquery converts all passed words to query where all words are combined with <-> (FOLLOW BY) operator. For example, the equivalent of the phraseto_tsquery('english', 'cat rule') is to_tsquery('english', 'cat <-> rule').

For the following usage:

SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ phraseto_tsquery('english', 'cat rule');


We get the result below:

Output: Cats rule the world

websearch_to_tsquery

The websearch_to_tsquery uses alternative syntax to create a valid text query. 

  • Unquoted text: Converts part of syntax in the same way as plainto_tsquery
  • Quoted text: Converts part of syntax in the same way as phraseto_tsquery
  • OR: Converts to "|" (OR) operator
  • "-": Same as "!" (NOT) operator

For example, the equivalent of the websearch_to_tsquery('english', '"cat rule" or database -Postgres') is to_tsquery('english', 'cat <-> rule | database & !Postgres').

For the following usage:

SQL
 
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ websearch_to_tsquery('english', '"cat rule" or database -Postgres');


We get the result below:

Output: short_content, character varying

Postgres and Hibernate Native Support

As mentioned in the article, Hibernate alone does not have full-text search support. It has to rely on database engine support. This means that we are allowed to execute native SQL queries as shown in the examples below:

  • plainto_tsquery
Java
 
public List<Tweet> findBySinglePlainQueryInDescriptionForConfigurationWithNativeSQL(String textQuery, String configuration) {
        return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ plainto_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
    }


  • websearch_to_tsquery
Java
 
public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescriptionWithNativeSQL(String textQuery, String configuration) {
        return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ websearch_to_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
    }


Hibernate With posjsonhelper Library

The posjsonhelper library is an open-source project that adds support for Hibernate queries for PostgreSQL JSON functions and full-text search.

For the Maven project, we need to add the dependencies below:

XML
 
<dependency>
    <groupId>com.github.starnowski.posjsonhelper.text</groupId>
    <artifactId>hibernate6-text</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>6.4.0.Final</version>
</dependency>


To use components that exist in the posjsonhelper library, we need to register them in the Hibernate context.

This means that there must be a specified org.hibernate.boot.model.FunctionContributor implementation. The library has an implementation of this interface, that is com.github.starnowski.posjsonhelper.hibernate6.PosjsonhelperFunctionContributor.

A file with the name "org.hibernate.boot.model.FunctionContributor" under the "resources/META-INF/services" directory is required to use this implementation.

There is another way to register posjsonhelper's component, which can be done through programmability. To see how to do that, check this link.

Now, we can use full-text search operators in Hibernate queries.

PlainToTSQueryFunction

This is a component that wraps the plainto_tsquery function.

Java
 
public List<Tweet> findBySinglePlainQueryInDescriptionForConfiguration(String textQuery, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PlainToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
        return entityManager.createQuery(query).getResultList();
    }


For a configuration with the value 'english', the code is going to generate the statement below:

Java
 
select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ plainto_tsquery('english', ?);


PhraseToTSQueryFunction 

This component wraps the phraseto_tsquery function.

Java
 
public List<Tweet> findBySinglePhraseInDescriptionForConfiguration(String textQuery, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PhraseToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
        return entityManager.createQuery(query).getResultList();
        }


For configuration with the value 'english', the code is going to generate the statement below:

SQL
 
select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ phraseto_tsquery('english', ?)


WebsearchToTSQueryFunction 

This component wraps the websearch_to_tsquery function.

Java
 
public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescription(String phrase, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new WebsearchToTSQueryFunction((NodeBuilder) cb, configuration, phrase), hibernateContext));
        return entityManager.createQuery(query).getResultList();
    }


For configuration with the value 'english', the code is going to generate the statement below:

SQL
 
select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ websearch_to_tsquery('english', ?)


HQL Queries

All mentioned components can be used in HQL queries. To check how it can be done, please click this link.

Why Use the posjsonhelper Library When We Can Use the Native Approach With Hibernate?

Although dynamically concatenating a string that is supposed to be an HQL or SQL query might be easy, implementing predicates would be better practice, especially when you have to handle search criteria based on dynamic attributes from your API.

Conclusion

As mentioned in the previous article, Postgres full-text search support can be a good alternative for substantial search engines like Elasticsearch or Lucene, in some cases. This could save us from the decision to add third-party solutions to our technology stack, which could also add more complexity and additional costs.

Apache Lucene Elasticsearch Hibernate PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Modify JSON Data in Postgres and Hibernate 6
  • Postgres JSON Functions With Hibernate 6
  • Postgres JSON Functions With Hibernate 5
  • Introduction to Elasticsearch

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!