Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

N1QL and SEARCH: How to Leverage Couchbase Full-Text Index (FTS) in N1QL

DZone 's Guide to

N1QL and SEARCH: How to Leverage Couchbase Full-Text Index (FTS) in N1QL

See how to leverage Couchbase Full-Text Index in N1QL.

· Database Zone ·
Free Resource

The one constant challenge for many application developers with relational databases is the query performance. It is by far the most elusive feature because the means to resolve the query performance are often limited to what the relational databases can offer, which is either getting a larger database server and/or better indexes.

With Couchbase, N1QL query performance also relies on similar components. But unlike relational RDBMS, Couchbase architecture of services isolation means both the Query and Index services can be scaled out independently. With appropriate sizing and capacity planning, the Couchbase data platform can deliver a blistering fast performance as reported in this Altoros NoSQL Benchmark report.

But while customers can achieve milliseconds response time for queries with appropriate indexes, there are times when the query predicates, the basis for the Couchbase GSI indexes, are not always known ahead of time. The ideal solution is to have an indexing system that can work with any combination of the available query predicates. Couchbase Adaptive Index can address many of these use cases. Couchbase Full-Text Search is also another candidate that can address the irregular pattern use cases. i.e. offering text and fuzzy search capability on any field in the document.

With Couchbase v6.5, Full-Text Search is now integrated into the Couchbase N1QL query construct. Customers can now leverage FTS index directly with N1QL, providing application developers a single API to combine both N1QL exact predicate matching and FTS powerful search capability.

Let’s consider the activity management document below:

  1. An activity always belong to a customer (account)

  2. An activity can also have multiple contacts from the customer’s organization and are represented by an array of contacts

  3. An activity can have multiples participants, represented by an array of users.

  4. An activity can be of type appointment or a task, both of which have their specific corresponding attributes, such as title, start date, due date, etc..

  5. The activity of type Task has an array of a ToDo list

Image title1. The Use Case

John, a service representative for a call center at Acme Ltd needs to retrieve all the customer activities while he is on the phone with a customer. The customer may provide one or many of the values below for John to query the application:

  1. Activity title: The query should return all activities that have this text anywhere in the activity title.

  2. Customer name: The entered customer name may be incomplete, thus the query needs to use a wildcard to match with the customer name.

  3. Contact name, email, or phone contact point: The customer may also provide a contact person details. These may also be incomplete.

  4. A participant name: The customer may also provide the name of the account manager, an employee of Acme whom the customer has interacted with and has been part of the activity.

  5. Activity date: The customer may provide a range of date and time for the activities.

1.1 The Requirements

  1. The service rep may receive one or more of the above information. The pattern is not fixed.

  2. The query response time needs to be <1 sec.

  3. Data volume is 3millions per year and a retention period of 3 years.

1.2 What Are the Challenges to Retrieve This Information?

  1. There could be up to eight fields that the customer can provide, and none of them are mandatory.

  2. This would pose a challenge for an efficient index because for a more efficient covering index design, a  leading key needs to be present for the index selection. As a result, indexes cannot cover all cases.

  3. Wildcard matching: The provided activity title, customer and contact name, email, or phone can be incomplete, so an exact N1QL predicate matching technique may not work.

  4. Both contacts and participants are child objects for activities. In the JSON data model, contacts and participants are represented as two separate arrays. If we need a coverage index, it needs to include one or more elements from both arrays.

2. The Solutions

2.1 The Simplest Approach Is to Use N1QL Predicates

SELECT meta(a).id, a.title, a.startDate, a.account.name, a.contacts, a.participants
FROM crm a 
WHERE 
  a.type='activity' 
AND a.activityType='Appointment'
AND
(
    LOWER(a.title) LIKE '%artificial intelligence%'
AND LOWER(a.account.name) LIKE '%collins%' 
AND ANY c in a.contacts SATISFIES LOWER(c.name) LIKE '%rogers%' END
AND ANY c in a.contacts SATISFIES LOWER(c.email) = 'elliottpamela@gmail.com' END 
AND ANY c in a.contacts SATISFIES c.phone LIKE '%6816%' END
AND ANY p in a.participants SATISFIES LOWER(p.name)LIKE '%james%' END 
AND a.startDate between '2016-08-29' AND '2016-08-30'
)

The following GSI indexes would also be required:

CREATE INDEX crm_activity_appt_startDate_accid_dur ON `crm`(`activityType`,`startDate`,`accid`,`duration`) WHERE `type` = 'activity'
CREATE INDEX act_account_name_type ON `crm`(lower((`account`).`name`)) WHERE `type` = 'activity'
CREATE INDEX act_contacts_name_type ON `crm`(DISTINCT ARRAY lower(`c`.`name`) FOR c in contacts END) WHERE `type` = 'activity'
CREATE INDEX act_contacts_email_type ON `crm`(DISTINCT ARRAY lower(`c`.`email`) FOR c in contacts END) WHERE `type` = 'activity'
CREATE INDEX act_contacts_phone_type ON `crm`(DISTINCT ARRAY c.phone FOR c in contacts END) WHERE `type` = 'activity'
CREATE INDEX act_participants_name_type ON `crm`(DISTINCT ARRAY lower(`p`.`name`) FOR p in participants END) WHERE `type` = 'activity'

Note that the above query may use one or all of the available indexes to improve query performance. However, there could still be performance issues because of the need for the query plan to use IntersectScan operation.

2.2 Leverage FTS Index

Couchbase Full-Text Search could help with this use case because of its non-exact search capability as well as the ability to search the fields in any order. Here is an FTS index that can cover the search criteria.

Image title

The FTS index targets only the document of type activity. It only includes the fields that are used in the search. In this example, the entire sub-document for accounts, contacts, and participants are included in the index. But you could further optimize the storage by including specific fields. 

You can read more about Couchbase Full-Text Search from https://docs.couchbase.com/server/6.0/fts/full-text-intro.html.

2.2.1 With N1QL/FTS Integration Using SEARCH_QUERY

SELECT meta(a).id, a.title,a.startDate, a.account.name, a.contacts, a.participants
FROM SEARCH_QUERY("all_acts",{"explain":false,"fields": ["*"],"highlight": {}, 
    "query": {"conjuncts":[ {"field":"title", "match": "artificial intelligence"} 
                          , {"field":"contacts.name", "match":"rogers"} 
                          ,  {"field":"contacts.email", "match":"eliottpamela@gmail.com"} 
                          ,  {"field":"contacts.phone", "wildcard":"*6816*"} 
                          ,  {"field": "participants.name", "match":"james"} 
                          ,  {"field": "account.name", "match":"collins"}
                          ,  {"field": "startDate","start": "2016-08-29", "end":"2016-08-30", "inclusive_start": true, 
                                   "inclusive_end": true}
                          ] }}
              ) as result
          UNNEST result.hits h
          INNER JOIN crm a ON (h.id = meta(a).id) AND a.type='activity'

This is a simple way to use N1QL/FTS without having the need to provide all the parameters for the CURL approach. It also allows for load balancing across all FTS nodes.

 2.2.3 With N1QL/FTS Integration Using N1QL SEARCH Predicate 

SELECT meta(a).id, a.title,a.startDate, a.account.name, a.contacts, a.participants
FROM crm a
WHERE a.type='activity'
AND a.activityType='Appointment'
AND SEARCH(a,
   {"conjuncts":
     [  {"field":"title", "match": "artificial intelligence"},
        {"field":"contacts.name", "match":"rogers"},
        {"field":"contacts.email", "match":"eliottpamela@gmail.com"},
        {"field":"contacts.phone", "wildcard":"*6816*"},
        {"field": "participants.name", "match":"james"},
        {"field": "account.name", "match":"collins"},
        {"field": "startDate","start": "2016-08-29", "end":"2016-08-30", "inclusive_start": true, "inclusive_end": true}
     ]
    }, 
    {"index":"all_acts"}) /* good practice to provide the specific index name */

This approach has all the benefits of the SEARCH_QUERY method but with the additional capability to combine other N1QL predicates. For more details on the FTS query syntax, go here: https://docs.couchbase.com/server/6.0/fts/full-text-intro.html

Notes:

  1. The above example leverages FTS compound query with the conjunct construct to combine all predicates into a single SEARCH(). Refer to Couchbase FTS documentation for more details on FTS Query type.

  2. The above statement could be programmatically constructed to include only the required/provided predicates.

  3. The FTS index design must include the fields that are used in the SEARCH()

  4. The N1QL predicate a.type=’activity’ must be present in the query for the FTS index selection

Summary

  1. N1QL/FTS integration allows the query to use FTS search construct directly as search predicates

  2. The use of the FTS index in the N1QL query alleviates the need to have an exact index for each query pattern.

  3. N1QL/FTS provides an additional tool for developers to explore when dealing with query performance issues

  4. FTS index is well-suited where you need to search on multiple fields in any order

  5. FTS index is well suited for cases where you need to search for fields in multiple arrays

N1QL SEARCH_QUERY and SEARCH predicate is part of the N1QL/FTS Integration feature available in Couchbase v6.5.

Topics:
database ,n1ql ,sql++ ,full-text search ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}