DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare
  • Formulating a Robust Strategy for Storage in Amazon Relational Database Service PostgreSQL Deployments
  • Point-In-Time Recovery (PITR) in PostgreSQL

Trending

  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning
  • FIPS 140-3: The Security Standard That Protects Our Federal Data
  • How To Develop a Truly Performant Mobile Application in 2025: A Case for Android
  • How Large Tech Companies Architect Resilient Systems for Millions of Users
  1. DZone
  2. Data Engineering
  3. Databases
  4. Postgres Full-Text Search

Postgres Full-Text Search

Let's take a look at Postgres full-text search function and see how to build a query document, create a tsquery, and matching.

By 
Sagar Sharma user avatar
Sagar Sharma
·
Updated Aug. 01, 18 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
19.3K Views

Join the DZone community and get the full member experience.

Join For Free

Search is one of the most important features while building an application. It leads to an increase in the popularity of various search engines like Elastic Search, Solr, etc. However, all of them require a different server to be set up and data sync processes to be implemented.

Would it not be better to have a solution that could minimize overheads and allow us to focus on the main objective, which is Search? Yes, and that's where the Postgres Full-Text Search comes into the picture. It searches for the data right where it is stored in your tables. There is no need to set up a different server.

Postgres FTS (Full-Text Search) supports the following features:

1) Fuzzy Match

2) Ranking

3) Phrase Search

4) Support for multiple languages

Querying Document

Data residing in Postgres tables can be queried using three steps:

  • Build Query Document i.e create a tsvector from the description of a company.
  • Create a tsquery from your search keywords.
  • Get the matching results using match operator @@.

Let's delve deep into each of the above to understand more in detail.

Building Query Document

Our description of the department is in text format. We need to convert it to tsvector, which is basically a list of unique lexemes along with their position in the description. Lexemes are a set of tokens that represent a set of different variants of the same work to look alike. For example:

"In the English language, run, runs, ran, and running are forms of the same lexeme, conventionally written as run."

The to_tsvector function is used to convert plain text description into a list of lexemes.

select * from 
to_tsvector('english', 'This department delivers the actual product to customers');

to_tsvector
----------------
'actual':5 'custom':8 'deliv':3 'depart':2 'product':6

In the above example, we parsed a simple text to create a list of tokens. The output contains the unique lexemes along with their position in which they appear in the above text.

Creating a tsquery

We now need to create a tsquery from our search keywords. For that, we will use a Postgres function to_tsquery to get a list of keywords. Let's see an example:

select * from to_tsquery('english', 'product | customer');

to_tsquery
----------------
'product' | 'custom'

In the above example, we passed two arguments to to_tsquery function.

1) arg1: It specifies the name of the dictionary that we want to Postgres to use to understand our text
2) arg2: It is text of keywords separated by boolean operators.

The output of tsquery contains the list of tokens and their logical relationship.

Postgres supports various logical operators like OR(|), AND(&) , NOT(!) and parentheses as well.

Matching Document

Now that we know how to create a tsvector and tsquery, we will see how to search for a word in a text. For this, we use @@ operator to perform a match between a tsvector and tsquery.

select to_tsvector('english', 'This department delivers the actual product to customers') 
@@ to_tsquery('english', 'product | customer');

Output
----------
true

select to_tsvector('english', 'This department delivers the actual product to customers') 
@@ to_tsquery('english', 'project');

Output
---------
false

As you can see, if there is a match between the text description and keywords, it returns true else false.

Ranking Results

The ts_rank function is used to get the relevancy score of the match. In simple words, the higher the rank the more relevant the match is. Let's see with an example:

1) 
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'product')
);

ts_rank
-----------
0.0607927106

2)
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'product & customers')
);

ts_rank
-----------
0.0985008553

3)
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'project')
);

ts_rank
-----------
0

We have mentioned three different SQL queries in the above snippet. Relevant scores across three cases differ depending on the number of words matched with the text description. In case there is no match, rank is calculated as 0.

The value of rank varies from 0 to 1. Higher rank indicates a stronger match.

Phrase Search

In Postgres 9.6, the capability to do a phrase search is also introduced. A special operator <-> can be used to specify the distance between two words in the text. Prior to this version, it was not possible, and Postgres did not consider the position of words during a match.

1)
select 
to_tsvector('english', 'This department delivers the actual product to customers') 
  @@ 
to_tsquery('english', 'product & customer');

Output
--------
true

2)
select 
to_tsvector('english', 'This department delivers the actual product to customers') 
  @@ 
to_tsquery('english', 'product <-> customer');

Output
---------
false

In case 1, the match operator will simply check the existence of both words in the text irrespective of the position.

In case 2, the match operator will check the existence of both words in the text and also considers that both these words should appear adjacent to each other.

Optimising Search Results

1) Performance of the Postgres full-text search can be improved by pre-generating the tsvector. It will save a considerable time during match and will result in faster performance. Secondly, a GIN index can be created on the tsvector column to make search more efficient.

2) Quality of search results can be ensured by using the same dictionary for creating tsquery and tsvector. A list of all available dictionaries in Postgres can be obtained by the following command:

select cfgname from pg_catalog.pg_ts_config;

Output
------------
simple
danish
dutch
english
finnish
french
german
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
turkish

If a different dictionary is used to create tsvector and tsquery, then it may lead to some discrepancies in search results.

Database PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare
  • Formulating a Robust Strategy for Storage in Amazon Relational Database Service PostgreSQL Deployments
  • Point-In-Time Recovery (PITR) in PostgreSQL

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: