Postgres Full-Text Search With Hibernate 6
In this article, explore the posjsonhelper library and learn about adding support for the full-text search for your project that uses Hibernate 6 and Postgres.
Join the DZone community and get the full member experience.
Join For FreeHibernate
Hibernate by itself does not have full-text search support. It has to rely on database engine support or third-party solutions.
An extension called Hibernate Search integrates with Apache Lucene or Elasticsearch (there is also integration with OpenSearch).
Postgres
Postgres has had full-text search functionality since version 7.3. Although it can not compete with search engines like Elasticsearch or Lucene, it still provides a flexible and robust solution that might be enough to meet application users' expectations—features like stemming, ranking, and indexing.
We will briefly explain how we can do a full-text search in Postgres. For more, please visit Postgres documentation. As for essential text matching, the most crucial part is the math operator @@
.
It returns true
if the document (object of type tsvector
) matches the query (object of type tsquery
).
The order is not crucial for the operator. So, it does not matter if we put the document on the left side of the operator and the query on the right side or in a different order.
For better demonstration, we use a database table called the tweet
.
create table tweet (
id bigint not null,
short_content varchar(255),
title varchar(255),
primary key (id)
)
With such data:
INSERT INTO tweet (id, title, short_content) VALUES (1, 'Cats', 'Cats rules the world');
INSERT INTO tweet (id, title, short_content) VALUES (2, 'Rats', 'Rats rules in the sewers');
INSERT INTO tweet (id, title, short_content) VALUES (3, 'Rats vs Cats', 'Rats and Cats hates each other');
INSERT INTO tweet (id, title, short_content) VALUES (4, 'Feature', 'This project is design to wrap already existed functions of Postgres');
INSERT INTO tweet (id, title, short_content) VALUES (5, 'Postgres database', 'Postgres is one of the widly used database on the market');
INSERT INTO tweet (id, title, short_content) VALUES (6, 'Database', 'On the market there is a lot of database that have similar features like Oracle');
Now let's see what the tsvector
object looks like for the short_content
column for each of the records.
SELECT id, to_tsvector('english', short_content) FROM tweet;
Output:
The output shows how to_tsvcector
converts the text column to a tsvector
object for the 'english
' text search configuration.
Text Search Configuration
The first parameter for the to_tsvector
function passed in the above example was the name of the text search configuration. In that case, it was the "english
". According to Postgres documentation, the text search configuration is as follows:
... full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled by text search configurations.
So, configuration is a crucial part of the process and vital to our full-text search results. For different configurations, the Postgres engine can return different results. This does not have to be the case among dictionaries for different languages. For example, you can have two configurations for the same language, but one ignores names containing digits (for example, some serial numbers). If we pass in our query the specific serial number we are looking for, which is mandatory, we won't find any record for configuration that ignores words with numbers. Even if we have such records in the database, please check the configuration documentation for more information.
Text Query
Text query supports such operators as &
(AND), |
(OR), !
(NOT), and <->
(FOLLOWED BY). The first three operators do not require a deeper explanation. The <->
operator checks if words exist and if they are placed in a specific order. So, for example, for the query "rat <-> cat
", we expect that the "cat" word is going to exist, followed by the "rat."
Examples
- Content that contains the rat and cat:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Rat & cat');
- Content that contains database and market, and the market is the third word after database:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database <3> market');
- Content that contains database but not Postgres:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database & !Postgres');
- Content that contains Postgres or Oracle:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Postgres | Oracle');
Wrapper Functions
One of the wrapper functions that creates text queries was already mentioned in this article, which is the to_tsquery
. There are more such functions like:
plainto_tsquery
phraseto_tsquery
websearch_to_tsquery
plainto_tsquery
The plainto_tsquery
converts all passed words to query where all words are combined with the &
(AND) operator. For example, the equivalent of the plainto_tsquery('english', 'Rat cat')
is to_tsquery('english', 'Rat & cat')
.
For the following usage:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ plainto_tsquery('english', 'Rat cat');
We get the result below:
phraseto_tsquery
The phraseto_tsquery
converts all passed words to query where all words are combined with <->
(FOLLOW BY) operator. For example, the equivalent of the phraseto_tsquery('english', 'cat rule')
is to_tsquery('english', 'cat <-> rule')
.
For the following usage:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ phraseto_tsquery('english', 'cat rule');
We get the result below:
websearch_to_tsquery
The websearch_to_tsquery uses alternative syntax to create a valid text query.
- Unquoted text: Converts part of syntax in the same way as
plainto_tsquery
- Quoted text: Converts part of syntax in the same way as
phraseto_tsquery
- OR: Converts to "
|
" (OR) operator - "
-
": Same as "!
" (NOT) operator
For example, the equivalent of the websearch_to_tsquery('english', '"cat rule" or database -Postgres')
is to_tsquery('english', 'cat <-> rule | database & !Postgres')
.
For the following usage:
SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ websearch_to_tsquery('english', '"cat rule" or database -Postgres');
We get the result below:
Postgres and Hibernate Native Support
As mentioned in the article, Hibernate alone does not have full-text search support. It has to rely on database engine support. This means that we are allowed to execute native SQL queries as shown in the examples below:
plainto_tsquery
public List<Tweet> findBySinglePlainQueryInDescriptionForConfigurationWithNativeSQL(String textQuery, String configuration) {
return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ plainto_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
}
websearch_to_tsquery
public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescriptionWithNativeSQL(String textQuery, String configuration) {
return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ websearch_to_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
}
Hibernate With posjsonhelper Library
The posjsonhelper library is an open-source project that adds support for Hibernate queries for PostgreSQL JSON functions and full-text search.
For the Maven project, we need to add the dependencies below:
<dependency>
<groupId>com.github.starnowski.posjsonhelper.text</groupId>
<artifactId>hibernate6-text</artifactId>
<version>0.3.0</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>6.4.0.Final</version>
</dependency>
To use components that exist in the posjsonhelper
library, we need to register them in the Hibernate context.
This means that there must be a specified org.hibernate.boot.model.FunctionContributor
implementation. The library has an implementation of this interface, that is com.github.starnowski.posjsonhelper.hibernate6.PosjsonhelperFunctionContributor
.
A file with the name "org.hibernate.boot.model.FunctionContributor" under the "resources/META-INF/services" directory is required to use this implementation.
There is another way to register posjsonhelper
's component, which can be done through programmability. To see how to do that, check this link.
Now, we can use full-text search operators in Hibernate queries.
PlainToTSQueryFunction
This is a component that wraps the plainto_tsquery function.
public List<Tweet> findBySinglePlainQueryInDescriptionForConfiguration(String textQuery, String configuration) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
Root<Tweet> root = query.from(Tweet.class);
query.select(root);
query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PlainToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
return entityManager.createQuery(query).getResultList();
}
For a configuration with the value 'english'
, the code is going to generate the statement below:
select
t1_0.id,
t1_0.short_content,
t1_0.title
from
tweet t1_0
where
to_tsvector('english', t1_0.short_content) @@ plainto_tsquery('english', ?);
PhraseToTSQueryFunction
This component wraps the phraseto_tsquery function.
public List<Tweet> findBySinglePhraseInDescriptionForConfiguration(String textQuery, String configuration) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
Root<Tweet> root = query.from(Tweet.class);
query.select(root);
query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PhraseToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
return entityManager.createQuery(query).getResultList();
}
For configuration with the value 'english'
, the code is going to generate the statement below:
select
t1_0.id,
t1_0.short_content,
t1_0.title
from
tweet t1_0
where
to_tsvector('english', t1_0.short_content) @@ phraseto_tsquery('english', ?)
WebsearchToTSQueryFunction
This component wraps the websearch_to_tsquery function.
public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescription(String phrase, String configuration) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
Root<Tweet> root = query.from(Tweet.class);
query.select(root);
query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new WebsearchToTSQueryFunction((NodeBuilder) cb, configuration, phrase), hibernateContext));
return entityManager.createQuery(query).getResultList();
}
For configuration with the value 'english'
, the code is going to generate the statement below:
select
t1_0.id,
t1_0.short_content,
t1_0.title
from
tweet t1_0
where
to_tsvector('english', t1_0.short_content) @@ websearch_to_tsquery('english', ?)
HQL Queries
All mentioned components can be used in HQL queries. To check how it can be done, please click this link.
Why Use the posjsonhelper Library When We Can Use the Native Approach With Hibernate?
Although dynamically concatenating a string that is supposed to be an HQL or SQL query might be easy, implementing predicates would be better practice, especially when you have to handle search criteria based on dynamic attributes from your API.
Conclusion
As mentioned in the previous article, Postgres full-text search support can be a good alternative for substantial search engines like Elasticsearch or Lucene, in some cases. This could save us from the decision to add third-party solutions to our technology stack, which could also add more complexity and additional costs.
Opinions expressed by DZone contributors are their own.
Comments