The Latest Performance Tricks in N1QL
This article highlights some interesting performance enhancements that N1QL incorporated in Couchbase Server 4.5.1 and the Couchbase Server 4.6 Developer preview release.
Join the DZone community and get the full member experience.
Join For FreeBoost N1QL Queries With Couchbase 4.5.1
In this article, I will share few awesome N1QL performance enhancements available in Couchbase Server 4.5.1. While some of these generically help all N1QL queries, a few improvements are specific in nature and significantly boost the performance of some kind of queries. For each of the improvements, I briefly explain the feature and show some numbers.
My earlier article Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More, talks about salient N1QL enhancements in 4.5.1 release whose performance characteristics will be explored here. More N1QL features are described in this blog n1ql-functionality-enhancements-in-couchbase-server-4.5.1
Concurrent Bulk Fetch Processing
As with any mature query processing engine, N1QL processes queries in multiple phases, as shown in Figure 1. N1QL also parallelizes many phases whenever possible (indicated by the four boxes/arrows in Figure 1). In this query processing, one of the expensive phases is Fetch, in which the query engine needs to retrieve all the relevant and matching documents from data service nodes for further processing. This occurs when the query does not have a satisfying covering index.
For instance, consider following query that needs to process all airport-documents. This uses the travel-sample bucket that is shipped with Couchbase Server.
CREATE INDEX def_type ON `travel-sample`(type);
SELECT * FROM `travel-sample` WHERE type = "airport";
While processing this query, N1QL can use the index def_type (which is pre-created when the sample bucket is installed) and gets all the document keys matching the predicate. Note that N1QL uses the predicate-push down technique to index to retrieve all airport-document keys.
Then, in the Fetch phase, N1QL fetches all the full documents corresponding to the keys obtained in step1.
Prior to Couchbase Server 4.5.1, N1QL does use multiple threads to process the fetch. However, when fetching a lot of documents, especially with many queries bombarding on N1QL, the thread management becomes very important for performance. To get a feel, consider a moderate load of 200 queries per second and each query trying to fetch 1,000 documents. It's a non-trivial task to efficiently create and manage threads to process those 200K fetch requests while being optimally aligned with the available CPU and cores.
In Couchbase Server 4.5.1, N1QL enhanced the management of fetch-phase threads to use a common pool of worker threads to process all the fetch requests (see all threads in a box, in Figure 2 below). The number of worker threads is optimally chosen according to the available CPUs so that all fetches are concurrently processed minimizing thread management overhead. This approach uses CPU and memory resources more efficiently, reuses threads, and prevents unnecessary creation and recreation of temporary threads.
Note that as with earlier releases, N1QL continues to use parallel processing (dictated by configuration parameter max_parallelism) to retrieve documents with multiple parallel fetch requests to data service nodes. In 4.5.1, documents retrieved in each fetch request are concurrently processed using a worker-thread-pool. The thread pool is automatically sized for the available CPUs and cores.
This is not just some fancy way of processing the queries, but it shows real results. At least, in internal throughput tests, I have noticed a significant performance boost for range queries and some point-queries. The above charts show the performance gain is consistently better, and in some cases, tests got up to eight times throughput (y-axis). Obviously, better performance gains are realized if the query fetches more documents. The x-axis is just the test case number, showing different tests with varying variables such as point/range queries, index storage type (MOI, GSI), scan consistency etc.
In summary, it is an interesting journey to discover the performance hidden in the thread management overheads. Kudos to the smart engineers! Note that the performance numbers vary depending on various factors such as:
Type of index storage, i.e., ForestDB vs. Memory Optimized Indexes.
Query scan consistency, i.e., not-bounded or request_plus.
Query predicates that dictate the number of document keys returned by the index and hence the number of documents fetched and processed.
- Operating system, i.e., Linux, Windows Server, etc
Efficient LIKE ‘%foo%’ Queries
Pattern matching regular expressions is certainly one of the expensive operations in query processing. Typically, LIKE ‘foo%’ queries can be implemented efficiently with a standard index because the first few characters are fixed. However, not very often we can find efficient techniques to process LIKE ‘%foo%’, where the search string contains a leading wildcard.
In Couchbase server 4.5.1, N1QL introduced a novel technique to solve this problem efficiently, by creating an Array Index on all possible suffix substrings of the index key. The new N1QL function SUFFIXES()
produces an array of all possible suffixes. For example, the query SELECT name FROM default WHERE name LIKE “%foo%” is rewritten into creating the suffix-array-index and magically replacing the LIKE “%foo%” predicate with LIKE “foo%” (without the leading wildcard).
This brings magnitude performance boost to LIKE queries. This smart technique is explained in detail in my earlier blog Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More.
Let’s look at an example showing that the performance gain is real and tangible. In my laptop testing, I got an impressive ~6.5 times faster latency. This uses the travel-sample
documents and bucket shipped with Couchbase Server.
Using the normal index def_type
, queries are run using the Couchbase query CBQ shell.
cbq> SELECT * FROM `travel-sample` USE INDEX(def_type)
> WHERE type = "landmark" AND title LIKE "%land%";
….
"metrics": {
"elapsedTime": "120.822583ms",
"executionTime": "120.790099ms",
"resultCount": 227,
"resultSize": 279707
}
Using the suffixes-based array index and accordingly modified equivalent query:
cbq> CREATE INDEX idx_title_suffix
ON `travel-sample`(DISTINCT ARRAY s FOR s IN SUFFIXES(title) END)
WHERE type = "landmark";
cbq> SELECT * FROM `travel-sample` USE INDEX(idx_title_suffix)
> WHERE type = "landmark" AND
> ANY s IN SUFFIXES(title) SATISFIES s LIKE "land%" END;
...
"metrics": {
"elapsedTime": "18.817681ms",
"executionTime": "18.795954ms",
"resultCount": 227,
"resultSize": 279707
}
Query Setting pretty
Saves ~30%
It is an interesting fact that, on average, the white space overhead in a pretty-formatted JSON documents is around 30%. You can check the overhead in your JSON documents as explained below (in Linux):
- If you have a formatted JSON document, you can trim all the white space characters:
varakurprasad$ cat pretty-json-doc.json | tr -d " \t\n\r" > not-pretty-json-doc.json
If you have an unformatted JSON document, you can simply format it using python2.6 or later version:
varakurprasad$ cat not-pretty-json-doc.json | python -m json.tool > pretty-json-doc.json
Of course, if you have the documents in Couchbase already, you can use this new N1QL parameter
pretty
to check the overhead.
This formatting overhead can be a significant component in your query performance, especially when the queries produce large result documents. More interestingly, note that all the pretty formatting does not make much sense to an application program that is running the queries. In fact, the applications mostly use a JSON parser to process the query results and discard all the whitespace formatting.
Pretty
is a new N1QL parameter introduced in 4.5.1 release that can enable or disable pretty formatting of the query results. The parameter can be:
Set to
true
orfalse
in the CBQ shell:cbq> \SET -pretty false;
.Passed to the N1QL service process CBQ-engine as command line parameters.
Passed as query parameter with REST calls.
Let’s look at the numbers showing performance gain with pretty
. Consider the `travel-sample` bucket, and the query select * from `travel-sample`
With
pretty = true
, this query took ~6.16sec in my macbook-pro.With
pretty = false
, same query ran in 2.2 sec.That’s a straight 3 times better performance! Find a detailed example in my earlier blog Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More.
Streaming Through DISTINCT Operator
Typically, a DISTINCT operator used in queries is blocking in nature. This means that it first waits for all the input records to be available and then processes them to find the records with values of a given field in the record.
N1QL enhances the DISTINCT operator to stream process the input records. In 4.5.1, DISTINCT operator scans the input and emits the distinct values as it progresses. With the efficient streaming algorithm, now DISTINCT consumes much less memory besides CPU cycles. This significantly improves query performance, especially if you have a large document set, and are using distinct on an attribute with low cardinality (such as state or country).
Note that, this also preserves the order of input, and consequently retains and leverages the index-order of the documents. Hence, queries with ORDER BY clause will benefit further by avoiding sort-processing.
Altogether, this significantly improves the response times seen by the applications, as they will start receiving the query results as soon as they are available, while N1QL is processing and producing subsequent results. This optimization is applicable to usage of distinct in various scenarios such as, query projections, union queries, and part of interesect, intersect all, except, except all, etc.
Efficient Memory Usage for Processing Projections
Processing of projections in queries is a memory-intensive operation especially when using * as projection-list. With this enhancement, N1QL optimized the memory allocation and memory usage algorithm to consume much less memory and reduced the over allocation and buffer. Depending on query, we have seen magnitude reduction in memory usage (for example, few GB to a couple hundred MB) and overall faster query processing efficiently.
Optimization for Processing IN With Parameters
This improvement optimized processing of queries which have the IN-clause used with parameters. This optimization specifically improves the usage of the available secondary indexes more precisely for pushing down the IN-clause predicate to index. For example, following query takes the parameter $city
which is used in the IN predicate:
cbq> \set -$city "Dublin" ;
cbq> explain select city from `travel-sample` where city IN ["Santa Monica", $city];
Let's see how the query executes. Without this optimization, a full scan of the index is performed (when parameters are involved), which obviously results in a much more expensive execution of the query because full-scan is a range scan over all entries in the index and returns all documents keys in the index.
cbq> \set -$city "Dublin" ;
cbq> explain select meta().id, city from `travel-sample` where city IN ["Santa Monica", $city];
{
"requestID": "82251ffa-7a9b-4c79-92ef-f549b2e2ee9f",
"signature": "json",
"results": [
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"covers": [
"cover ((`travel-sample`.`city`))",
"cover ((meta(`travel-sample`).`id`))"
],
"index": "def_city",
"index_id": "64a2b5261a86839d",
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"Range": {
"Inclusion": 0,
"Low": [
"null"
]
}
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(cover ((`travel-sample`.`city`)) in [\"Santa Monica\", $city])"
},
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "cover ((meta(`travel-sample`).`id`))"
},
{
"expr": "cover ((`travel-sample`.`city`))"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "select meta().id, city from `travel-sample` where city IN [\"Santa Monica\", $city]"
}
],
"metrics": {
"elapsedTime": "65.649232ms",
"executionTime": "65.630462ms",
"resultCount": 55,
"resultSize": 2524
}
Note that, following span is generated for the index lookup, with represents a full scan of the index.
"spans": [
{
"Range": {
"Inclusion": 0,
"Low": [
"null"
]
}
}
],
Further, note that if it the query is not covered by the index scan (example: * in the projection), then it subsequently results in a fetch of all the documents from data service. However, most of these fetched documents may be discarded by N1QL when the IN predicate is processed later in the filtering-phase (refer to the N1QL execution phases in the beginning of this blog).
In Couchbase server 4.5.1, N1QL implemented an optimization for processing such queries. Now, N1QL generates precise lookup spans to index for each of the values in the IN-clause and avoids the expensive full scan. Note the spans[]
array in the following EXPLAIN
output that shows two index lookup spans generated for each of the IN-values (including the parameter $city
).
cbq> \set -$city "Dublin" ;
cbq> explain select city from `travel-sample` where city IN ["Santa Monica", $city];
{
"requestID": "aef94329-a1cb-4880-add0-c2efee7d1d8a",
"signature": "json",
"results": [
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan",
"covers": [
"cover ((`travel-sample`.`city`))",
"cover ((meta(`travel-sample`).`id`))"
],
"index": "def_city",
"index_id": "ad4c900681b6abe9",
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"Santa Monica\""
],
"Inclusion": 3,
"Low": [
"\"Santa Monica\""
]
}
},
{
"Range": {
"High": [
"$city"
],
"Inclusion": 3,
"Low": [
"$city"
]
}
}
],
"using": "gsi"
}
},
"metrics": {
"elapsedTime": "14.564755ms",
"executionTime": "14.53648ms",
"resultCount": 55,
"resultSize": 2524
}
Finally, note that this query ran almost 4.5 times faster (y-axis in the following chart is latency in ms). Note that, performance gain entirely depends on the selectivity of the IN predicate. That means that if the IN predicate matches only few documents, then the gain because of this optimization is much more pronounced than when the IN predicate matches many or all documents (in which case it tends to reach closer to a full scan anyway).
Summary
N1QL incorporated many performance and feature enhancements in Couchbase Server 4.5.1 as well as in the more recent Couchbase Server 4.6 Developer preview release. This article highlights some of the interesting performance enhancements. My earlier article Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More, talks about salient N1QL enhancements in 4.5.1 release.
Opinions expressed by DZone contributors are their own.
Comments