DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • How To Generate Scripts of Database Objects in SQL Server
  • JQueue: A Library to Implement the Outbox Pattern
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps

Trending

  • GitHub Copilot's New AI Coding Agent Saves Developers Time – And Requires Their Oversight
  • Scaling Microservices With Docker and Kubernetes on Production
  • Rust, WASM, and Edge: Next-Level Performance
  • Enforcing Architecture With ArchUnit in Java
  1. DZone
  2. Data Engineering
  3. Databases
  4. Couchbase: Improving Performance When Querying Multiple Arrays With FTS and N1QL

Couchbase: Improving Performance When Querying Multiple Arrays With FTS and N1QL

In this article, see how to improve performance when querying multiple arrays with FTS and N1QL.

By 
Brian Kane user avatar
Brian Kane
·
Mar. 31, 20 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
9.9K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Couchbase Full Text Search (FTS) is a great fit for indexing multiple arrays and executing queries with multiple filter predicates in arrays.  In this article, I’ll demonstrate the advantages of using FTS over GSI (Global Secondary Index) for array indexing while working through an example use case that requires querying multiple arrays.  We’ll be creating an FTS multi-array index and querying the index with N1QL using the new SEARCH() function introduced in Couchbase Server 6.5.  

Travel Sample Bucket

In this article, we’ll be referencing the Travel Sample dataset available to install in any Couchbase Server instance. The travel-sample bucket has several distinct document types: airline, route, airport, landmark, and hotel.  The document model for each kind of document contains:

  • A key that acts as a primary key

  • An id field that identifies the document

  • A type field that identifies the kind of document

The examples in this article will be using the hotel documents. The sample document below gives you an idea of the structure of a hotel document: 

Figure 1 - Sample Hotel Document

The Problem

Our example is a use case where a user can search for hotels that have been reviewed or liked by a person with a particular name. This requires querying hotel documents on both public likes and reviews, which are arrays within the hotel document model: 

Figure 2 - The “public_likes” and “reviews” arrays in the sample hotel document


First let’s look at implementing this use case with N1QL and GSI (Global Secondary Index). To find hotels that anyone named Ozella has either liked or reviewed, the query could look like this: 

SQL
 




xxxxxxxxxx
1


 
1
SELECT name, address, city, country,
2
       phone, public_likes, reviews
3
FROM `travel-sample`
4
WHERE type="hotel"
5
    AND (ANY l IN public_likes SATISFIES l LIKE "%Ozella%" END
6
        OR ANY r IN reviews SATISFIES r.author LIKE "%Ozella%" END);



We need to create an appropriate index for this query.  Maybe something like this that indexes both arrays of interest for hotel documents: 

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX idx_hotel_public_likes_review_author ON `travel-sample`
2
(DISTINCT ARRAY `l` FOR l IN `public_likes` END, 
3
DISTINCT ARRAY `r`.`author` FOR r IN `reviews` END)
4
WHERE `type` = 'hotel';



This doesn’t work, and we get the error shown in Figure 3:

Figure 3 - Error creating index with multiple arrays


As Keshav Murthy wrote in his blog post Search and Rescue: 7 Reasons for N1QL (SQL) developers to use Search (problem #6), with N1QL in Couchbase, “to get the best performance while searching inside arrays, you need to create indexes with array keys. The array index comes with a limitation: each array index can only have one array key per index. So, when you have a customer object with multiple array fields, you can’t search all of them using a single index...causing expensive queries.” As Keshav notes in that article, this is a limitation with b-tree indexes in databases generally. 

So now let’s try two separate array indexes.  The indexes to support this query could look like these, which were created using the Couchbase N1QL Index Advisor, a new (DP) feature in Couchbase 6.5:

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX adv_DISTINCT_public_likes_type ON `travel-sample`(DISTINCT ARRAY `l` FOR l in `public_likes` END) WHERE `type` = 'hotel';
2
 
          
3
CREATE INDEX adv_DISTINCT_reviews_author_type ON `travel-sample`(DISTINCT ARRAY `r`.`author` FOR r in `reviews` END) WHERE `type` = 'hotel';



With those 2 indexes in place, our query runs successfully with 5 results (hotel_26020, hotel_10025, hotel_5081, hotel_20425, hotel_25327) and the following execution plan: 

Figure 4 - Execution plan using multiple indexes (GSI)


Same plan in JSON:

JSON
 




xxxxxxxxxx
1
252


 
1
{
2
  "#operator": "Sequence",
3
  "#stats": {
4
    "#phaseSwitches": 1,
5
    "execTime": "1.321µs"
6
  },
7
  "~children": [
8
    {
9
      "#operator": "Authorize",
10
      "#stats": {
11
        "#phaseSwitches": 3,
12
        "execTime": "3.034µs",
13
        "servTime": "1.037859ms"
14
      },
15
      "privileges": {
16
        "List": [
17
          {
18
            "Target": "default:travel-sample",
19
            "Priv": 7
20
          }
21
        ]
22
      },
23
      "~child": {
24
        "#operator": "Sequence",
25
        "#stats": {
26
          "#phaseSwitches": 1,
27
          "execTime": "2.235µs"
28
        },
29
        "~children": [
30
          {
31
            "#operator": "UnionScan",
32
            "#stats": {
33
              "#itemsIn": 1646,
34
              "#itemsOut": 904,
35
              "#phaseSwitches": 5107,
36
              "execTime": "1.32474ms",
37
              "kernTime": "113.495553ms"
38
            },
39
            "scans": [
40
              {
41
                "#operator": "DistinctScan",
42
                "#stats": {
43
                  "#itemsIn": 4004,
44
                  "#itemsOut": 813,
45
                  "#phaseSwitches": 9641,
46
                  "execTime": "1.381997ms",
47
                  "kernTime": "69.065425ms"
48
                },
49
                "scan": {
50
                  "#operator": "IndexScan3",
51
                  "#stats": {
52
                    "#itemsOut": 4004,
53
                    "#phaseSwitches": 16021,
54
                    "execTime": "19.678094ms",
55
                    "kernTime": "30.973177ms",
56
                    "servTime": "17.461885ms"
57
                  },
58
                  "index": "adv_DISTINCT_public_likes_type",
59
                  "index_id": "288083a758973630",
60
                  "index_projection": {
61
                    "primary_key": true
62
                  },
63
                  "keyspace": "travel-sample",
64
                  "namespace": "default",
65
                  "spans": [
66
                    {
67
                      "range": [
68
                        {
69
                          "high": "[]",
70
                          "inclusion": 1,
71
                          "low": "\"\""
72
                        }
73
                      ]
74
                    }
75
                  ],
76
                  "using": "gsi",
77
                  "#time_normal": "00:00.037",
78
                  "#time_absolute": 0.037139979000000004
79
                },
80
                "#time_normal": "00:00.001",
81
                "#time_absolute": 0.0013819969999999998
82
              },
83
              {
84
                "#operator": "DistinctScan",
85
                "#stats": {
86
                  "#itemsIn": 4104,
87
                  "#itemsOut": 833,
88
                  "#phaseSwitches": 9881,
89
                  "execTime": "2.475034ms",
90
                  "kernTime": "80.914158ms"
91
                },
92
                "scan": {
93
                  "#operator": "IndexScan3",
94
                  "#stats": {
95
                    "#itemsOut": 4104,
96
                    "#phaseSwitches": 16421,
97
                    "execTime": "8.610445ms",
98
                    "kernTime": "52.02497ms",
99
                    "servTime": "22.586149ms"
100
                  },
101
                  "index": "adv_DISTINCT_reviews_author_type",
102
                  "index_id": "cca7f912cab1a4c6",
103
                  "index_projection": {
104
                    "primary_key": true
105
                  },
106
                  "keyspace": "travel-sample",
107
                  "namespace": "default",
108
                  "spans": [
109
                    {
110
                      "range": [
111
                        {
112
                          "high": "[]",
113
                          "inclusion": 1,
114
                          "low": "\"\""
115
                        }
116
                      ]
117
                    }
118
                  ],
119
                  "using": "gsi",
120
                  "#time_normal": "00:00.031",
121
                  "#time_absolute": 0.031196594
122
                },
123
                "#time_normal": "00:00.002",
124
                "#time_absolute": 0.002475034
125
              }
126
            ],
127
            "#time_normal": "00:00.001",
128
            "#time_absolute": 0.00132474
129
          },
130
          {
131
            "#operator": "Fetch",
132
            "#stats": {
133
              "#itemsIn": 904,
134
              "#itemsOut": 904,
135
              "#phaseSwitches": 3733,
136
              "execTime": "2.887995ms",
137
              "kernTime": "8.826606ms",
138
              "servTime": "170.010321ms"
139
            },
140
            "keyspace": "travel-sample",
141
            "namespace": "default",
142
            "#time_normal": "00:00.172",
143
            "#time_absolute": 0.172898316
144
          },
145
          {
146
            "#operator": "Parallel",
147
            "#stats": {
148
              "#phaseSwitches": 1,
149
              "execTime": "6.134µs"
150
            },
151
            "copies": 2,
152
            "~child": {
153
              "#operator": "Sequence",
154
              "#stats": {
155
                "#phaseSwitches": 2,
156
                "execTime": "3.621µs"
157
              },
158
              "~children": [
159
                {
160
                  "#operator": "Filter",
161
                  "#stats": {
162
                    "#itemsIn": 904,
163
                    "#itemsOut": 5,
164
                    "#phaseSwitches": 1824,
165
                    "execTime": "279.461548ms",
166
                    "kernTime": "85.245883ms"
167
                  },
168
                  "condition": "(((`travel-sample`.`type`) = \"hotel\") and (any `l` in (`travel-sample`.`public_likes`) satisfies (`l` like \"%Ozella%\") end or any `r` in (`travel-sample`.`reviews`) satisfies ((`r`.`author`) like \"%Ozella%\") end))",
169
                  "#time_normal": "00:00.279",
170
                  "#time_absolute": 0.279461548
171
                },
172
                {
173
                  "#operator": "InitialProject",
174
                  "#stats": {
175
                    "#itemsIn": 5,
176
                    "#itemsOut": 5,
177
                    "#phaseSwitches": 25,
178
                    "execTime": "7.156613ms",
179
                    "kernTime": "357.453351ms"
180
                  },
181
                  "result_terms": [
182
                    {
183
                      "expr": "(`travel-sample`.`name`)"
184
                    },
185
                    {
186
                      "expr": "(`travel-sample`.`address`)"
187
                    },
188
                    {
189
                      "expr": "(`travel-sample`.`city`)"
190
                    },
191
                    {
192
                      "expr": "(`travel-sample`.`country`)"
193
                    },
194
                    {
195
                      "expr": "(`travel-sample`.`phone`)"
196
                    },
197
                    {
198
                      "expr": "(`travel-sample`.`public_likes`)"
199
                    },
200
                    {
201
                      "expr": "(`travel-sample`.`reviews`)"
202
                    }
203
                  ],
204
                  "#time_normal": "00:00.007",
205
                  "#time_absolute": 0.007156613
206
                },
207
                {
208
                  "#operator": "FinalProject",
209
                  "#stats": {
210
                    "#itemsIn": 5,
211
                    "#itemsOut": 5,
212
                    "#phaseSwitches": 17,
213
                    "execTime": "12.167µs",
214
                    "kernTime": "98.849µs"
215
                  },
216
                  "#time_normal": "00:00.000",
217
                  "#time_absolute": 0.000012167
218
                }
219
              ],
220
              "#time_normal": "00:00.000",
221
              "#time_absolute": 0.000003621
222
            },
223
            "#time_normal": "00:00.000",
224
            "#time_absolute": 0.000006134
225
          }
226
        ],
227
        "#time_normal": "00:00.000",
228
        "#time_absolute": 0.0000022349999999999998
229
      },
230
      "#time_normal": "00:00.001",
231
      "#time_absolute": 0.0010408930000000002
232
    },
233
    {
234
      "#operator": "Stream",
235
      "#stats": {
236
        "#itemsIn": 5,
237
        "#itemsOut": 5,
238
        "#phaseSwitches": 13,
239
        "execTime": "939.145µs",
240
        "kernTime": "182.523171ms"
241
      },
242
      "#time_normal": "00:00.000",
243
      "#time_absolute": 0.000939145
244
    }
245
  ],
246
  "~versions": [
247
    "2.0.0-N1QL",
248
    "6.5.0-4960-enterprise"
249
  ],
250
  "#time_normal": "00:00.000",
251
  "#time_absolute": 0.000001321
252
}



In the single-node cluster being used for these examples, the query elapsed time is around 190-200 milliseconds to return the 5 resulting documents.  As you can see in the plan, there are two IndexScan3 operators which use each of the two array indexes we created, followed by a DistinctScan for the results of each index scan, and then a UnionScan.  The UnionScan shows an #itemsIn value of 1646 documents and an #itemsOut value of 904 documents, the Fetch operator also gets 904 documents, and finally, with the Filter operator we get an #ItemsOut value of 5.  The fetch of 904 documents is a waste considering that we ended up with 5 documents returned by the query, and in fact, about 170 milliseconds of the overall elapsed time is spent fetching the 905 documents when only 5 are needed.  

The Solution

By contrast, an FTS inverted index can easily be created for multiple arrays and is well-suited for cases where you need to search for fields in multiple arrays. We’ll create a FTS index on hotel documents for both the public_likes array and the author field within the reviews array. 

Index creation steps: 

  1. On the Full Text Search UI, click “Add Index”.

  2. Specify an index name, e.g. “hotel_mult_arrays”, and select the travel-sample bucket. 

  3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.

  4. Under type mappings:  

    1. Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  

    2. A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index.

    3. Since the requirement is to search the hotel public likes and review author fields, check “only index specified fields”.  With this checked, only user-specified fields from the document are included in the index for the hotel type mapping (the mapping will not be dynamic, meaning that all fields are considered available for indexing). 

    4. Click OK.  

    5. Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the public_likes array to be individually included in the index.  Specify the following: 

      1. field: Enter the name of the field to be indexed, “public_likes”.

      2. type: Leave this set to text for the public_likes array.

      3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 

      4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.

      5. index checkbox: Leave this checked, so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.

      6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results.  This is useful for testing the index, but not recommended in Prod if highlighting isn’t required since it increases index size.

      7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 

      8. “include term vectors” checkbox: Leave this checked, too, during development and testing of our index to allow highlighting of results. 

      9. docvalues checkbox: Uncheck this setting.  This setting stores the field values in the index which provides support for Search Facets, and for the sorting of search results based on field values, neither of which we need in this use case. 

      10. Click OK.

    6. Mouse over the row with the hotel type mapping, click the + button, and then click “insert child mapping”.  This will allow the array of review sub-documents to be included in the index.  Enter the property name “reviews”, leave “inherit” selected in the analyzer drop-down, check “only index specified fields”, and click OK. 

    7. Mouse over the row with the reviews child mapping, click the + button, and then click “insert child field”.  This will allow the author field from the array of review sub-documents to be included in the index.  Specify the following: 

      1. field: Enter the name of the field to be indexed, “author”.

      2. type: Leave this set to text for the author field.

      3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 

      4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.

      5. index checkbox: Leave this checked, so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.

      6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results.  This is useful for testing the index, but not recommended in Prod if highlighting isn’t required since it increases index size.

      7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 

      8. “include term vectors” checkbox: Leave this checked, too, during development and testing of our index to allow highlighting of results.  

      9. docvalues checkbox: Uncheck this setting.  This setting stores the field values in the index which provides support for Search Facets, and for the sorting of search results based on field values, neither of which we need in this use case. 

      10. Click OK.

    8. Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 

  5. The default values suffice for the remaining collapsed panels (Analyzers, Custom Filters, Date/Time Parsers, and Advanced). 

  6. Index Replicas can be set to 1, 2 or 3, provided that the cluster is running the Search service on n+1 nodes. With a single node development environment, maintain the default value of 0. 

  7. For Index Type, the default value of “Version 6.0 (Scorch)” is appropriate for any newly created indexes. Scorch reduces the size of the index on disk, and provides enhanced performance for indexing and mutation-handling.

  8. Index Partitions can be left to the default value of 6. 

  9. At this point, the create index page should look like the last frame captured in Figure 5.  Click “Create Index” to complete the process. 

Figure 5 - Creating FTS index with multiple arrays


Note: See Appendix for the JSON payload used to create this index through the REST API.

Testing queries against the index: 

  1. On the Full Text Search UI, wait for indexing progress to show 100%, then click on the index name “hotel_mult_arrays”. 

  2. To search for any hotels with likes or reviews by someone named “Ozella”, in the “search this index…” text box, enter “Ozella” and click Search.  Field-scoping of the search is not required because both indexed fields are included in the default field “_all”.

  3. The results are shown (similar to Figure 6) with the key of each matching document and highlighted matching fields.  The document IDs returned are the same as those from our earlier N1QL query.  

Figure 6 - Index “hotel_mult_arrays” search results for “Ozella”


This is a single index on 2 array keys, which, as mentioned earlier, is something you could never do in a b-tree based index.  So now let’s take advantage of this FTS index in a N1QL query by using the SEARCH() function.  Our query could look like this: 

SQL
 




xxxxxxxxxx
1


 
1
SELECT name, address, city, country,
2
       phone, public_likes, reviews
3
FROM `travel-sample` AS t USE INDEX(hotel_mult_arrays USING FTS)
4
WHERE t.type="hotel"
5
AND SEARCH(t, {"query": {"match":"Ozella"}}, {"index":"hotel_mult_arrays"});



A few things to note about the query: 

  • The USE INDEX...USING FTS clause specifies that the FTS index should be used rather than a GSI index, so this query doesn’t use the index service. (Documentation)

  • Because our FTS index uses a custom type mapping, the query needs to have the matching type specified in the WHERE clause (t.type="hotel").  

  • The FTS index name is specified in the “index” field in the SEARCH() function as a hint, but that is optional since the USE INDEX clause takes precedence over a hint provided in the “index” field.  (Documentation)  

Using the FTS index we created, our N1QL query runs successfully and returns 5 results (hotel_5081, hotel_26020, hotel_10025, hotel_20425, hotel_25327) and the following execution plan: 

Figure 7 - Execution plan using multiple indexes (FTS)


Same plan in JSON:

JSON
 




xxxxxxxxxx
1
177


 
1
{
2
  "#operator": "Sequence",
3
  "#stats": {
4
    "#phaseSwitches": 1,
5
    "execTime": "18.8µs"
6
  },
7
  "~children": [
8
    {
9
      "#operator": "Authorize",
10
      "#stats": {
11
        "#phaseSwitches": 3,
12
        "execTime": "32.1µs",
13
        "servTime": "3.421ms"
14
      },
15
      "privileges": {
16
        "List": [
17
          {
18
            "Target": "default:travel-sample",
19
            "Priv": 7
20
          }
21
        ]
22
      },
23
      "~child": {
24
        "#operator": "Sequence",
25
        "#stats": {
26
          "#phaseSwitches": 1,
27
          "execTime": "122.8µs"
28
        },
29
        "~children": [
30
          {
31
            "#operator": "IndexFtsSearch",
32
            "#stats": {
33
              "#itemsOut": 5,
34
              "#phaseSwitches": 23,
35
              "execTime": "239.3µs",
36
              "kernTime": "84.5µs",
37
              "servTime": "3.9146ms"
38
            },
39
            "as": "t",
40
            "index": "hotel_mult_arrays",
41
            "index_id": "7a28a8346fad6118",
42
            "keyspace": "travel-sample",
43
            "namespace": "default",
44
            "search_info": {
45
              "field": "\"\"",
46
              "options": "{\"index\": \"hotel_mult_arrays\"}",
47
              "outname": "out",
48
              "query": "{\"query\": {\"match\": \"Ozella\"}}"
49
            },
50
            "using": "fts",
51
            "#time_normal": "00:00.004",
52
            "#time_absolute": 0.0041539
53
          },
54
          {
55
            "#operator": "Fetch",
56
            "#stats": {
57
              "#itemsIn": 5,
58
              "#itemsOut": 5,
59
              "#phaseSwitches": 25,
60
              "execTime": "334.8µs",
61
              "kernTime": "4.4328ms",
62
              "servTime": "1.5272ms"
63
            },
64
            "as": "t",
65
            "keyspace": "travel-sample",
66
            "namespace": "default",
67
            "#time_normal": "00:00.001",
68
            "#time_absolute": 0.001862
69
          },
70
          {
71
            "#operator": "Parallel",
72
            "#stats": {
73
              "#phaseSwitches": 1,
74
              "execTime": "21.1µs"
75
            },
76
            "copies": 2,
77
            "~child": {
78
              "#operator": "Sequence",
79
              "#stats": {
80
                "#phaseSwitches": 2,
81
                "execTime": "49.1µs"
82
              },
83
              "~children": [
84
                {
85
                  "#operator": "Filter",
86
                  "#stats": {
87
                    "#itemsIn": 5,
88
                    "#itemsOut": 5,
89
                    "#phaseSwitches": 26,
90
                    "execTime": "6.8953ms",
91
                    "kernTime": "14.8149ms"
92
                  },
93
                  "condition": "(((`t`.`type`) = \"hotel\") and search(`t`, {\"query\": {\"match\": \"Ozella\"}}, {\"index\": \"hotel_mult_arrays\"}))",
94
                  "#time_normal": "00:00.006",
95
                  "#time_absolute": 0.0068953
96
                },
97
                {
98
                  "#operator": "InitialProject",
99
                  "#stats": {
100
                    "#itemsIn": 5,
101
                    "#itemsOut": 5,
102
                    "#phaseSwitches": 25,
103
                    "execTime": "2.3597ms",
104
                    "kernTime": "20.7458ms"
105
                  },
106
                  "result_terms": [
107
                    {
108
                      "expr": "(`t`.`name`)"
109
                    },
110
                    {
111
                      "expr": "(`t`.`address`)"
112
                    },
113
                    {
114
                      "expr": "(`t`.`city`)"
115
                    },
116
                    {
117
                      "expr": "(`t`.`country`)"
118
                    },
119
                    {
120
                      "expr": "(`t`.`phone`)"
121
                    },
122
                    {
123
                      "expr": "(`t`.`public_likes`)"
124
                    },
125
                    {
126
                      "expr": "(`t`.`reviews`)"
127
                    }
128
                  ],
129
                  "#time_normal": "00:00.002",
130
                  "#time_absolute": 0.0023597
131
                },
132
                {
133
                  "#operator": "FinalProject",
134
                  "#stats": {
135
                    "#itemsIn": 5,
136
                    "#itemsOut": 5,
137
                    "#phaseSwitches": 17,
138
                    "execTime": "300µs",
139
                    "kernTime": "375.9µs"
140
                  },
141
                  "#time_normal": "00:00",
142
                  "#time_absolute": 0
143
                }
144
              ],
145
              "#time_normal": "00:00.000",
146
              "#time_absolute": 0.0000491
147
            },
148
            "#time_normal": "00:00.000",
149
            "#time_absolute": 0.0000211
150
          }
151
        ],
152
        "#time_normal": "00:00.000",
153
        "#time_absolute": 0.0001228
154
      },
155
      "#time_normal": "00:00.003",
156
      "#time_absolute": 0.0034531
157
    },
158
    {
159
      "#operator": "Stream",
160
      "#stats": {
161
        "#itemsIn": 5,
162
        "#itemsOut": 5,
163
        "#phaseSwitches": 13,
164
        "execTime": "1.3409ms",
165
        "kernTime": "14.8586ms"
166
      },
167
      "#time_normal": "00:00.001",
168
      "#time_absolute": 0.0013409
169
    }
170
  ],
171
  "~versions": [
172
    "2.0.0-N1QL",
173
    "6.5.0-4960-enterprise"
174
  ],
175
  "#time_normal": "00:00.000",
176
  "#time_absolute": 0.0000188
177
}



In the single-node cluster being used for these examples, the query elapsed time is around 20 milliseconds to return the same 5 documents.  As you can see in the plan, there is an IndexFtsSearch operator but there are no IndexScan3, DistinctScan, UnionScan, or IntersectScan operators.  The overall query is much more efficient without these expensive GSI operators.  The IndexFtsSearch operator sends the 5 matching documents from the FTS index to the Fetch operator which gets only those 5 documents.  The fetch is much more efficient here than in the previous query since it’s only fetching 5 vs 904 documents, and this can also be observed in the comparison of overall elapsed times (and the servTime for the fetch operators: 170ms in query 1 and 1.5ms in query 2) between the queries.  

Conclusion

With GSI you can mix and match multiple array indexes in a single query, but with FTS you can mix and match multiple arrays in a single FTS index (and with FTS there is no leading key problem as in GSI regarding the order of the fields in the index).  As we’ve shown in this simple example of querying 2 arrays in the hotel documents, utilizing the new SEARCH() function in N1QL can result in simpler and more performant array queries. The same concept could be applied to queries utilizing several arrays, which would have even more favorable results over N1QL queries utilizing multiple GSI array indexes.  This approach uses fewer system resources and provides higher throughput, which results in an increase in overall system efficiency.  

This is just one example of the benefits of the integration between N1QL and FTS, and other benefits are documented in the blog posts in the references section below. 

References

  • Couchbase Search Resources: https://www.couchbase.com/products/full-text-search

  • Couchbase FTS Documentation: https://docs.couchbase.com/server/current/fts/full-text-intro.html

  • Couchbase N1QL Search Documentation: https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/searchfun.html

  • Couchbase FTS Blog Posts: https://blog.couchbase.com/category/full-text-search/

  • Couchbase FTS Online Training: https://learn.couchbase.com/store/509465-cb121-intro-to-couchbase-full-text-search-fts

Appendix

Index Definition JSON: hotel_mult_arrays


JSON
 




xxxxxxxxxx
1
84


 
1
{
2
  "type": "fulltext-index",
3
  "name": "hotel_mult_arrays",
4
  "uuid": "5fc5d43dfebc4a60",
5
  "sourceType": "couchbase",
6
  "sourceName": "travel-sample",
7
  "sourceUUID": "a1dd9dbb6aa27a47fac317dabfe74f61",
8
  "planParams": {
9
    "maxPartitionsPerPIndex": 171,
10
    "indexPartitions": 6
11
  },
12
  "params": {
13
    "doc_config": {
14
      "docid_prefix_delim": "",
15
      "docid_regexp": "",
16
      "mode": "type_field",
17
      "type_field": "type"
18
    },
19
    "mapping": {
20
      "analysis": {},
21
      "default_analyzer": "standard",
22
      "default_datetime_parser": "dateTimeOptional",
23
      "default_field": "_all",
24
      "default_mapping": {
25
        "dynamic": true,
26
        "enabled": false
27
      },
28
      "default_type": "_default",
29
      "docvalues_dynamic": true,
30
      "index_dynamic": true,
31
      "store_dynamic": false,
32
      "type_field": "_type",
33
      "types": {
34
        "hotel": {
35
          "dynamic": false,
36
          "enabled": true,
37
          "properties": {
38
            "public_likes": {
39
              "dynamic": false,
40
              "enabled": true,
41
              "fields": [
42
                {
43
                  "docvalues": true,
44
                  "include_in_all": true,
45
                  "include_term_vectors": true,
46
                  "index": true,
47
                  "name": "public_likes",
48
                  "store": true,
49
                  "type": "text"
50
                }
51
              ]
52
            },
53
            "reviews": {
54
              "dynamic": false,
55
              "enabled": true,
56
              "properties": {
57
                "author": {
58
                  "dynamic": false,
59
                  "enabled": true,
60
                  "fields": [
61
                    {
62
                      "docvalues": true,
63
                      "include_in_all": true,
64
                      "include_term_vectors": true,
65
                      "index": true,
66
                      "name": "author",
67
                      "store": true,
68
                      "type": "text"
69
                    }
70
                  ]
71
                }
72
              }
73
            }
74
          }
75
        }
76
      }
77
    },
78
    "store": {
79
      "indexType": "scorch"
80
    }
81
  },
82
  "sourceParams": {}
83
}
84
 
          



Database Document Data structure Use case Operator (extension) sql

Opinions expressed by DZone contributors are their own.

Related

  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • How To Generate Scripts of Database Objects in SQL Server
  • JQueue: A Library to Implement the Outbox Pattern
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!