DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

[DZone Research] Observability + Performance: We want to hear your experience and insights. Join us for our annual survey (enter to win $$).

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Understanding RDS Costs
  • Getting Started With Postgres: Three Free and Easy Ways
  • SQL Server to Postgres Database Migration
  • Important Takeaways for PostgreSQL Indexes

Trending

  • Embracing Reactive Programming With Spring WebFlux
  • Programming With AI
  • Automate Migration Assessment With XML Linter
  • Product Backlog Management: Tips, Tricks, and Ruinous Anti-Patterns
  1. DZone
  2. Data Engineering
  3. Databases
  4. Postgres Full-Text Search

Postgres Full-Text Search

Let's take a look at Postgres full-text search function and see how to build a query document, create a tsquery, and matching.

Sagar Sharma user avatar by
Sagar Sharma
·
Updated Aug. 01, 18 · Tutorial
Like (4)
Save
Tweet
Share
18.53K Views

Join the DZone community and get the full member experience.

Join For Free

Search is one of the most important features while building an application. It leads to an increase in the popularity of various search engines like Elastic Search, Solr, etc. However, all of them require a different server to be set up and data sync processes to be implemented.

Would it not be better to have a solution that could minimize overheads and allow us to focus on the main objective, which is Search? Yes, and that's where the Postgres Full-Text Search comes into the picture. It searches for the data right where it is stored in your tables. There is no need to set up a different server.

Postgres FTS (Full-Text Search) supports the following features:

1) Fuzzy Match

2) Ranking

3) Phrase Search

4) Support for multiple languages

Querying Document

Data residing in Postgres tables can be queried using three steps:

  • Build Query Document i.e create a tsvector from the description of a company.
  • Create a tsquery from your search keywords.
  • Get the matching results using match operator @@.

Let's delve deep into each of the above to understand more in detail.

Building Query Document

Our description of the department is in text format. We need to convert it to tsvector, which is basically a list of unique lexemes along with their position in the description. Lexemes are a set of tokens that represent a set of different variants of the same work to look alike. For example:

"In the English language, run, runs, ran, and running are forms of the same lexeme, conventionally written as run."

The to_tsvector function is used to convert plain text description into a list of lexemes.

select * from 
to_tsvector('english', 'This department delivers the actual product to customers');

to_tsvector
----------------
'actual':5 'custom':8 'deliv':3 'depart':2 'product':6

In the above example, we parsed a simple text to create a list of tokens. The output contains the unique lexemes along with their position in which they appear in the above text.

Creating a tsquery

We now need to create a tsquery from our search keywords. For that, we will use a Postgres function to_tsquery to get a list of keywords. Let's see an example:

select * from to_tsquery('english', 'product | customer');

to_tsquery
----------------
'product' | 'custom'

In the above example, we passed two arguments to to_tsquery function.

1) arg1: It specifies the name of the dictionary that we want to Postgres to use to understand our text
2) arg2: It is text of keywords separated by boolean operators.

The output of tsquery contains the list of tokens and their logical relationship.

Postgres supports various logical operators like OR(|), AND(&) , NOT(!) and parentheses as well.

Matching Document

Now that we know how to create a tsvector and tsquery, we will see how to search for a word in a text. For this, we use @@ operator to perform a match between a tsvector and tsquery.

select to_tsvector('english', 'This department delivers the actual product to customers') 
@@ to_tsquery('english', 'product | customer');

Output
----------
true

select to_tsvector('english', 'This department delivers the actual product to customers') 
@@ to_tsquery('english', 'project');

Output
---------
false

As you can see, if there is a match between the text description and keywords, it returns true else false.

Ranking Results

The ts_rank function is used to get the relevancy score of the match. In simple words, the higher the rank the more relevant the match is. Let's see with an example:

1) 
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'product')
);

ts_rank
-----------
0.0607927106

2)
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'product & customers')
);

ts_rank
-----------
0.0985008553

3)
select ts_rank(
  to_tsvector('english', 'This department delivers the actual product to customers') , 
  to_tsquery('english', 'project')
);

ts_rank
-----------
0

We have mentioned three different SQL queries in the above snippet. Relevant scores across three cases differ depending on the number of words matched with the text description. In case there is no match, rank is calculated as 0.

The value of rank varies from 0 to 1. Higher rank indicates a stronger match.

Phrase Search

In Postgres 9.6, the capability to do a phrase search is also introduced. A special operator <-> can be used to specify the distance between two words in the text. Prior to this version, it was not possible, and Postgres did not consider the position of words during a match.

1)
select 
to_tsvector('english', 'This department delivers the actual product to customers') 
  @@ 
to_tsquery('english', 'product & customer');

Output
--------
true

2)
select 
to_tsvector('english', 'This department delivers the actual product to customers') 
  @@ 
to_tsquery('english', 'product <-> customer');

Output
---------
false

In case 1, the match operator will simply check the existence of both words in the text irrespective of the position.

In case 2, the match operator will check the existence of both words in the text and also considers that both these words should appear adjacent to each other.

Optimising Search Results

1) Performance of the Postgres full-text search can be improved by pre-generating the tsvector. It will save a considerable time during match and will result in faster performance. Secondly, a GIN index can be created on the tsvector column to make search more efficient.

2) Quality of search results can be ensured by using the same dictionary for creating tsquery and tsvector. A list of all available dictionaries in Postgres can be obtained by the following command:

select cfgname from pg_catalog.pg_ts_config;

Output
------------
simple
danish
dutch
english
finnish
french
german
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
turkish

If a different dictionary is used to create tsvector and tsquery, then it may lead to some discrepancies in search results.

Database PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Understanding RDS Costs
  • Getting Started With Postgres: Three Free and Easy Ways
  • SQL Server to Postgres Database Migration
  • Important Takeaways for PostgreSQL Indexes

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: