DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today
  1. DZone
  2. Data Engineering
  3. Databases
  4. First Class SQL for Full-Text Search
Content provided by Couchbase logo

First Class SQL for Full-Text Search

Overview of N1QL (SQL for JSON) support for full text search in Couchbase.

Keshav Murthy user avatar by
Keshav Murthy
CORE ·
Jul. 06, 20 · Tutorial
Like (7)
Save
Tweet
Share
7.29K Views

(With SQL) Your app is sitting on a Ferrari-style compute engine.  — Lukas Eder

Over time, the database industry has realized text search and SQL are two sides of the same coin. Text search needs further query processing, query processing needs text search to efficiently filter for text patterns. The SQL databases have added text search within them, albeit for a single node SMP systems.

  • SQL Server supports CONTAINS() for text search
  • Oracle supports CONTAINS() for text search
  • MySQL added full text support 
  • PostgreSQL has supported text search for a long time.

Couchbase Full-Text Search (FTS) is created with three main motivations:

  • Transparently search across multiple fields within a document
  • Go beyond the exact matching of values by providing language based stemming, fuzzy matching, etc.
  • Provide the search results based on relevance

FTS achieves this on an inverted index and a rich set of query predicates: from simple word search to pattern matching to complex range predicates. In addition to the search, it supports aggregation via search facets.

In the NoSQL world, Lucene is a popular search index and so are the search servers based on Lucene: Solr and Elasticsearch.  Following their RDBMS cousins, Elasticsearch, Opendistro for Elasticsearch all have added SQL for their search. Couchbase introduced the full text service, FTS and has followed up with support for search within N1QL.

  • FTS with N1QL
  • Elasticsearch with SQL
  • Opendistro for Elasticsearch with SQL
  • And the recent entrant to search, MongoDB has added search to MQL using Lucene in its Atlas offering.

The SQL Implementations of Elasticsearch with SQL and MongoDB’s MQL comes with a long list of limitations.

Elasticsearch with SQL has listed its limitations here:

  • Full list: https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html
    • In addition, the language is understandably limited due to its nascent implementation.
    • No support for set operations joins, etc, etc.
    • No window functions.

MongoDB’s MQL’s search integration comes with a long list of limitations.

  • Available only on the Atlas search service, not on the on-prem product.
  • Search can only be the FIRST operation within the aggregate() pipeline.
  • Available only within the aggregation pipeline (aggregate()) and not in find(), insert(), update(), remove() other other operations.

The integration with its aggregate() API, comes with some limitations: It can only be the first operation in the pipeline unavailable on its on-prem database. The features we discuss in this article are in Couchbase 6.5 and above.

Here’s an example from N1QL:


SQL
 




xxxxxxxxxx
1
10
9


 
1
SELECT country, 
2
               city, 
3
               name, 
4
               ROW_NUMBER() OVER(ORDER BY country DESC, city DESC) rownum
5
FROM `travel-sample` AS t1
6
WHERE t1.type = "hotel" AND SEARCH(t1.description, "garden") 
7
AND ANY r in reviews satisfies r.ratings.Service > 3 END;



This includes the following in addition to SEARCH():

  • Projection of fields from the documents: country, city, name
  • Row number generation via the window function ROW_NUMBER()
  • Additional scalar predicate  t1.type = “hotel”
  • Array predicate on reviews (ANY)

You get the FULL benefit of first-class query processing in addition to efficient search.  That’s not all — there’s even more with N1QL. The benefits and effectiveness of SQL are well known.  N1QL is SQL for JSON. The goal of N1QL is to give developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data.

The benefits of using N1QL with the search are the following:

  1. Predicates:
    • FTS is great with searching based on relevance. SQL is great with additional complex query processing: complex predicates, array predicates, additional scalar 
  2. Operators and functions:
    1. Predicate processing (filter processing)
      1. Additional scalar and array predicates
      2. Scalar and array functions too be used within the predicates
      3. Subqueries
        1. Correlated subqueries
        2. Uncorrelated subqueries 
    2. Aggregates
    3. Window functions
  3. JOIN processing
    1. N1QL can do INNER JOIN, LEFT OUTER JOIN, (limited) RIGHT OUTER JOIN, NEST, UNNEST
    2. JOINS between buckets, collections and results of subqueries.
  4. SET operations
    1. UNION
    2. UNION ALL
    3. EXCEPT
    4. EXCEPT ALL
    5. INTERSECT
    6. INTERSECT ALL
  5. CTE (Common Table Expression) and LET clause for improved query writing
  6. More than SEARCH()
    1. In addition to SELECT, you can use SEARCH() predicate in the WHERE clauses of INSERT, UPDATE, DELETE, MERGE statements.
    2. You can PREPARE these statements and EXECUTE them repeatedly and efficiently.
    3. You get the usual security via the RBAC roles via GRANT & REVOKE.
  7. Developer productivity: Write the query in SQL, the language they already know.

Let’s look at how the N1QL engine executes this. Abhinav Dangeti from the Couchbase FTS engineering has already written a great blog detailing the decision making and examples. This article is to explain this visually with additional examples in the categories mentioned above.

1. Architecture for Query Execution

We’ve added three important steps to query execution the query uses SEARCH() :

  1. The planner considers the FTS search index one of the valid access paths if search() predicate exists in the query.
    • If the search index is selected, then it creates the plan by pushing down the search predicate to the FTS index.
  2. When the search index is selected, executor issues the search request to one of the FTS nodes (instead of the scan request to the index service)
  3. Before the results from the search are finalized, the query service re-verifies the search qualification of the document to the data.

Query execution with FTS

Inside a query service


2. Predicate Processing

In the following query, the SEARCH() predicate (predicate-2) is pushed to the FTS search request.  Every other predicate is processed by the query engine post search in the “Filter” phase  — as shown in the “Inside a Query Service” figure above.  This is one exception to this.  When the FTS index has created an index with JSON type field (doc_config.type_field in the index definition document) is defined (in this case type = “hotel”) to create the index on the subset of the document,  both index selection and search pushdown exploits this predicate.  Even in this case, the predicate is re-applied after the document fetch.

SQL
 




x


 
1

          
2
SELECT country, 
3
               city, 
4
               name, 
5
               ROW_NUMBER() OVER(ORDER BY country DESC, city DESC) rownum
6
FROM `travel-sample` AS t1
7
WHERE 
8
t1.type = "hotel"   /* predicate-1 */
9
AND SEARCH(t1.description, "garden") /* predicate-2 */
10
AND ANY r in reviews satisfies r.ratings.Service > 3 END; /* predicate-2 */



3. OPERATORS and FUNCTIONS

Here’s an example of a query exploiting the operators and functions.

SQL
 




xxxxxxxxxx
1
10


 
1
 
2
SELECT LOWER(country),   /* scalar function */
3
       city, 
4
       lastname || " " || firstname AS fullname   /* string operator */
5
       ROW_NUMBER() OVER(ORDER BY country DESC, city DESC) rownum   /* window function */
6
FROM `travel-sample` AS t1
7
WHERE 
8
LOWER(t1.type) = "hotel"   /* scalar function */
9
AND SEARCH(t1.description, "garden") 
10
AND ARRAY_CONTAINS(public_likes, "Joe Black")  /* Array function */



Here’s the query plan for this query. IndexSearch does the FTS search request and this is layered into the query execution pipeline.  Hence the query gets the benefit of all the other capabilities of N1QL.  This reflects the pipeline stages in the figure above.

Pipeline stages

4. JOIN processing

.The SEARCH() can also be used as part of the join processing. In this case, the FTS is used to find all the cities which have hotels with gardens and then join with airports.

MariaDB SQL
 




xxxxxxxxxx
1
10
9


 
1
 SELECT hotel.name hname,
2
       airport.city
3
FROM `travel-sample` hotel 
4
LEFT OUTER JOIN `travel-sample` airport ON hotel.city = airport.city
5
WHERE hotel.type = 'hotel'
6
    AND SEARCH(hotel.description, "garden")
7
    AND airport.type = 'airport' ;


Join processing

5. Common Table Expressions (CTEs).

N1QL in query service supports non-recursive CTEs. You can use SEARCH() within each expression. The derived table from that expression (hotel and airport) are used as keyspaces within the query.

SQL
 




xxxxxxxxxx
1
19


 
1
WITH hotel AS (
2
    SELECT name,
3
           city
4
    FROM `travel-sample`
5
    WHERE type = 'hotel'
6
        AND search(description, "garden")),
7
airport AS (
8
    SELECT name,
9
           city
10
    FROM `travel-sample`
11
    WHERE type = 'airport'
12
        AND SEARCH(city, "angeles"))
13
SELECT hotel.name hname,
14
       airport.city
15
FROM hotel
16
INNER JOIN airport ON hotel.city = airport.city
17
ORDER BY airport.city,
18
         hotel.name;
19

          


CTEs


5. Use in UPDATEs

SEARCH() can be used anywhere a predicate is allowed within other DML statements.

SQL
 




xxxxxxxxxx
1
13


 
1
/* INSERT INTO... SELECT statement. */
2
INSERT INTO mybucket (KEY id, VALUES v)
3
          SELECT meta().id  id, v 
4
          FROM `travel-sample` v
5
          WHERE SEARCH(v, "+type:hotel  +description:clean");
6
 
7
/* DELETE statement */
8
DELETE FROM `travel-sample` WHERE SEARCH(v, "+type:hotel +description:clean");
9
 
10
/* UPDATE statement */
11
UPDATE `travel-sample` SET new_field = "search n update!" WHERE SEARCH(v, "+type:hotel +description:clean"); 
12
 
13
 



The examples can flow a long time.  I’ve shown common sample examples. You use this in various SQL statements (DMLs)

Conclusion:

Couchbase FTS provides a scalable, distributed text search engine.  We’ve seamlessly layered it into N1QL in Couchbase Query service so you get the full power of queries with the full power of search.   There’s more innovation on this in the pipeline. Stay tuned!


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: