Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More

DZone's Guide to

Apps in N1QL Land: Magic With Pretty, Suffix, Update, and More

With the most recent release of Couchbase, N1QL got a few new tricks up its sleeve. See what it has to offer, and how the new offerings impact efficiency, ease of use, and performance.

· Database Zone
Free Resource

Find out how Database DevOps helps your team deliver value quicker while keeping your data safe and your organization compliant. Align DevOps for your applications with DevOps for your SQL Server databases to discover the advantages of true Database DevOps, brought to you in partnership with Redgate

Couchbase/N1QL 

Couchbase is all about enabling more and more enterprise applications to leverage and adopt NoSQL/JSON data model. And, N1QL is the wonderland for applications which need all the great benefits of the JSON/Document databases, and don't want to lose the power of query language like SQL. In fact, N1QL is SQL++ and is built with tons of features to simplify the transition from traditional relational databases, and achieve best of both worlds.

In the latest release, Couchbase Server 4.5.1 brings multiple functionality, usability, and performance improvements in N1QL. While some of the new improvements enhance existing functionality, others, such as SUFFIXES() and pretty, enrich N1QL querying with magnitude performance improvement. 

In this blog, I will show glimpses of the wonder land, and explain some of this magic.  See what’s new and release notes for full list of N1QL enhancements. Kudos N1QL team!!

Indexing With SUFFIXES() for Efficient LIKE Queries 

Pattern matching is a widely used functionality in SQL queries, and is typically achieved using the LIKE operator. Especially, efficient wildcard matching is very important. LIKE ‘foo%’ can be implemented efficiently with a standard index, as it has fixed prefix substring. But LIKE ‘%foo%’ is generally expensive. Such pattern matching with leading wildcard is vital for every application that has a search box to match partial words or to smart-suggest matching text. For example:

  • A travel booking site that wants to pop-up matching airports as the user starts to enter a few letters of the airport name.
  • A user finding all e-mails with a specific word or partial word in the subject.
  • Finding all topics of a forum or blog posts with specific keywords in the title.

In Couchbase Server 4.5.1, N1QL addresses this problem by adding a new string function —SUFFIXES(), and combining that with the Array Indexing functionality introduced in Couchbase Server 4.5. Together, it brings a magnitude of difference to the performance of LIKE queries with leading wildcards such as LIKE "%foo%". The core functionality of SUFFIXES() is very simple. Basically, it produces an array of all possible suffix substrings of a given string. For example, 

SUFFIXES("N1QL") = [ "N1QL", "1QL", "QL", "L" ]

The following picture depicts a unique technique to combine the SUFFIXES() function with Array Indexing to boost LIKE query performance.

Using SUFFIXES() Index in N1QL

 

  1. Step 1 (in blue) shows the array of suffix substrings generated by SUFFIXES() for doc1.title
  2. Step 2 (in yellow) shows the Array Index created with the suffix substrings generated in step 1. Note that the index-entry for"wood" points to doc1 and doc4, as that is one of the suffix substrings of titles of both the documents. Similarly, "ood" points to doc1doc4, and doc8.
  3. Step 3 (in green) runs a query equivalent to SELECT title FROM bucket WHERE title LIKE "%wood%". The LIKE predicate is transformed to use the Array Index using the ANY construct. See the documentation for more details on using array indexing.
    • Note that the leading wildcard is removed in the new LIKE "wood%" predicate.
    • This is accurate transformation because the array index lookup for "wood" points to all documents whose title has trailing substring "wood"
  4. In Step 4, N1QL looks up in the Array Index to find all documents matching "wood%". That returns {doc1, doc3, doc4}, because:
    • The index lookup produces a span, which gets documents from "wood" to "wooe"
    • doc1 and doc4 are matched because of index entry "wood" that is generated by the SUFFIXES() when creating the array index.
    • doc3 is matched because of its corresponding index-entry for "woodland"
  5. Finally, in step 5, N1QL returns the query results.

Let's see a working example with the  travel-sample  documents/bucket, which is shipped with the Couchbase Server product. This showed a 10-12x boost in performance for the example query.

  1. Assume a document with a string field whose value is few words of text or a phrase. For example, title of a landmark, address of a place, name of restaurant,  full name of a person/place etc., For this explanation, we consider title of landmarkdocuments in travel-sample.
  2. Create secondary index on title field using SUFFIXES() as:
           CREATE INDEX idx_title_suffix 
           ON `travel-sample`(DISTINCT ARRAY s FOR s IN SUFFIXES(title) END)
           WHERE type = "landmark";
    SUFFIXES(title) generates all possible suffix substrings of title, and the index will have entries for each of those substrings, all referencing to corresponding documents.
  3. Now consider following query, which finds all docs with substring "land" in title. This query produces following plan and runs in roughly 120ms on my laptop. You can clearly see, it fetches all landmark documents, and then applies the LIKE "%land%"predicate to find all matching documents.
    EXPLAIN SELECT * FROM `travel-sample` USE INDEX(def_type) WHERE type = "landmark" AND title LIKE "%land%";
    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "def_type",
              "index_id": "e23b6a21e21f6f2",
              "keyspace": "travel-sample",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"landmark\""
                    ],
                    "Inclusion": 3,
                    "Low": [
                      "\"landmark\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "keyspace": "travel-sample",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((`travel-sample`.`type`) = \"landmark\") and ((`travel-sample`.`title`) like \"%land%\"))"
                  }
                ]
            }
  4. In Couchbase 4.5.1, this query can be rewritten to leverage the array index idx_title_suffix created in (2) above.
    EXPLAIN SELECT title FROM `travel-sample` USE INDEX(idx_title_suffix) WHERE type = "landmark" AND  
    ANY s IN SUFFIXES(title) SATISFIES s LIKE "land%"  END;
    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "DistinctScan",
              "scan": {
                "#operator": "IndexScan",
                "index": "idx_title_suffix",
                "index_id": "75b20d4b253214d1",
                "keyspace": "travel-sample",
                "namespace": "default",
                "spans": [
                  {
                    "Range": {
                      "High": [
                        "\"lane\""
                      ],
                      "Inclusion": 1,
                      "Low": [
                        "\"land\""
                      ]
                    }
                  }
                ],
                "using": "gsi"
              }
            },
            {
              "#operator": "Fetch",
              "keyspace": "travel-sample",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((`travel-sample`.`type`) = \"landmark\") and any `s` in suffixes((`travel-sample`.`title`)) satisfies (`s` like \"land%\") end)"
                  },

Note that:

  • The new query in (4) uses LIKE “land%”, instead of LIKE “%land%”. The former predicate, with no leading wildcard '%', produces a much more efficient index lookup than the latter one, which can’t push down the predicate to index.
  • The array index idx_title_suffix is created with all possible suffix substrings of title, and hence lookup for any suffix substring of title can find successful match.
  • In the above 4.5.1 query plan in (4), N1QL pushes down the LIKE predicate to the index lookup, and avoids additional pattern-matching string processing. This query ran in 18ms.
  • In fact, with following covering Array Index, the query ran in 10ms, which is 12x faster.
       CREATE INDEX idx_title_suffix_cover
       ON `travel-sample`(DISTINCT ARRAY s FOR s IN SUFFIXES(title) END, title)
       WHERE type = "landmark";

See this blog for details on a real-world application of this feature.

UPDATE Nested Arrays

Enterprise applications often have complex data and need to model JSON documents with multiple levels of nested objects and arrays. N1QL supports complex expressions and language constructs to navigate and query such documents with nested arrays. N1QL also supports Array Indexing, with which secondary indexes can be created on array elements, and subsequently queried.

In Couchbase Server 4.5.1, the UPDATE statement syntax is improved to navigate nested arrays in documents and update specific fields in nested array elements. The FOR-clause of the UPDATE statement is enhanced to evaluate functions and expressions, and the new syntax supports multiple nested  FOR expressions to access and update fields in nested arrays.
Consider the following document with a nested array like:

{ 
  items: [
    {
      subitems: [
        {
          name: "N1QL"
        },
        {
          name: "GSI"
        }
      ] 
    }
  ],
  docType: "couchbase"   
}

The new UPDATE syntax in 4.5.1 can be used in different ways to access and update nested arrays:

  • UPDATE default SET s.newField = 'newValue' 
    FOR s IN ARRAY_FLATTEN(items[*].subitems, 1) END; 
  • UPDATE default 
    SET s.newField = 'newValue' 
    FOR s IN ARRAY_FLATTEN(ARRAY i.subitems FOR i IN items END, 1) END; 
  • UPDATE default 
    SET i.subitems = ( ARRAY OBJECT_ADD(s, 'newField', 'newValue') 
                       FOR s IN i.subitems END ) FOR i IN items END; 

Note that:

  • The SET-clause evaluates functions such as OBJECT_ADD() and ARRAY_FLATTEN()
  • FOR constructs can be used in a nested fashion with expressions to process array elements at different nest-levels. 

For a working example, consider the sample bucket travel-sample, shipped with 4.5.1.

  1. First, let’s add a nested array of special flights to the array schedule in the travel-sample bucket, for some documents.
    UPDATE `travel-sample` 
    SET schedule[0] = {"day" : 7, "special_flights" : 
                        [ {"flight" : "AI444", "utc" : "4:44:44"}, 
                          {"flight" : "AI333", "utc" : "3:33:33"} 
                        ] } 
    WHERE type = "route" AND destinationairport = "CDG" AND sourceairport = "TLV";
  2. The following UPDATE statement  adds a third field to each special flight:
    UPDATE `travel-sample`
    SET i.special_flights = ( ARRAY OBJECT_ADD(s, 'newField', 'newValue' ) 
                              FOR s IN i.special_flights END ) 
                                           FOR i IN schedule END
    WHERE type = "route" AND destinationairport = "CDG" AND sourceairport = "TLV";
    
    
    SELECT schedule[0] from `travel-sample` 
    WHERE type = "route" AND destinationairport = "CDG" AND sourceairport = "TLV" LIMIT 1;
    [
      {
        "$1": {
          "day": 7,
          "special_flights": [
            {
              "flight": "AI444",
              "newField": "newValue",
              "utc": "4:44:44"
            },
            {
              "flight": "AI333",
              "newField": "newValue",
              "utc": "3:33:33"
            }
          ]
        }
      }
    ]

Performance with new Query setting  pretty

Pretty!! Pretty!! pretty!!  Yes, pretty is a super impressive new query setting in N1QL that can enable or disable pretty formatting of query results. You might ask, what’s so great about it? And why anybody may want to disable the beautiful pretty formatting of the JSON output. Answer is  P E R F O R M A N C E!!

As the old adage goes, there are no free lunches. Pretty formatting query results comes with its own expense:

  1. First, a quick fact is that the white space (tabs, spaces, newlines) characters in a pretty-formatted JSON document consume almost a third of its size.
    • So, simply cutting down on the beautification will save all the raw bytes flowing over the network.
    • Moreover, consider the corresponding saving on the memory and processing resources of N1QL service.
    • Altogether, savings are pretty significant.
  2. The pretty output format is good for human readable scenarios, with human-manageable result sizes.
    • However, real world applications and computer programs run queries much more often than humans — and process much bigger query results.
    • For these, what matters is performance and efficiency; not pretty formatting. In fact, such formatting is an overhead to the JSON parser and application and is usually discarded. Typically, applications have their own presentation layer to format the data appropriately for respective users.

The new query parameter pretty in 4.5.1 allows to enable/disable formatting a query result.  The parameter can be:

  • Set to true or falsein the cbq shell
    •   cbq> \SET -pretty false; 
    varakurprasad$ ./cbq -u=Administrator -p=password
    Connected to : http://localhost:8091/. Type Ctrl-D or \QUIT to exit.
    
    cbq> \SET -pretty false;
    cbq> select id, name from `travel-sample` limit 1;
    {
    "requestID": "e9d885bc-2378-4345-bf44-a336fbabce77",
    "signature": {"id":"json","name":"json"},
    "results": [
    {"id":10,"name":"40-Mile Air"}
    ],
    "status": "success",
    "metrics": {"elapsedTime": "9.829246ms","executionTime": "9.805407ms","resultCount": 1,"resultSize": 30}
    }
  • Passed to the N1QL service process cbq-engine  as command line parameters.
  • Passed as Query parameter with REST calls. See example below.

By default, it is set to true. When set to false, the whitespace characters are stripped from the query results. The performance benefits are significantly visible when queries produce large results, and it, of course, depends on the percentage of white-space overhead in your documents.

For example, following query which selects all documents from travel-sample, run almost 3x faster when pretty = false. Also, note the size of the result set, which is one-third of the pretty formatted result.

With pretty = true

varakurprasad$ time curl -v http://localhost:8093/query/service -d "pretty=true&statement=SELECT * from \`travel-sample\`" | tail -15
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* Connected to localhost (::1) port 8093 (#0)
* Server auth using Basic with user 'travel-sample'
> POST /query/service HTTP/1.1
> Authorization: Basic dHJhdmVsLXNhbXBsZTpoZWxsbw==
> User-Agent: curl/7.37.1
> Host: localhost:8093
> Accept: */*
> Content-Length: 51
> Content-Type: application/x-www-form-urlencoded
> 
} [data not shown]
* upload completely sent off: 51 out of 51 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; version=1.6.0
< Date: Sat, 15 Oct 2016 02:04:09 GMT
< Transfer-Encoding: chunked
< 
{ [data not shown]
100  103M    0  103M  100    51  16.6M      8  0:00:06  0:00:06 --:--:-- 17.2M
* Connection #0 to host localhost left intact
                ],
                "sourceairport": "TLV",
                "stops": 0,
                "type": "route"
            }
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "6.165034483s",
        "executionTime": "6.164993497s",
        "resultCount": 31591,
        "resultSize": 107830610
    }
}

real 0m6.208s
user 0m5.704s
sys 0m0.373s

With pretty = false

Note that the total resultSize now is only 36.7MB, and the query ran in 2.2sec. This is 3X better performance compared to pretty=true , which took 6.2sec and returned 107.8MB.

varakurprasad$ time curl -v http://localhost:8093/query/service -d "pretty=false&statement=SELECT * from \`travel-sample\`" | tail -5
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* Connected to localhost (::1) port 8093 (#0)
* Server auth using Basic with user 'travel-sample'
> POST /query/service HTTP/1.1
> Authorization: Basic dHJhdmVsLXNhbXBsZTpoZWxsbw==
> User-Agent: curl/7.37.1
> Host: localhost:8093
> Accept: */*
> Content-Length: 52
> Content-Type: application/x-www-form-urlencoded
> 
} [data not shown]
* upload completely sent off: 52 out of 52 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; version=1.6.0
< Date: Sat, 15 Oct 2016 02:03:29 GMT
< Transfer-Encoding: chunked
< 
{ [data not shown]
100 35.1M    0 35.1M  100    52  15.9M     23  0:00:02  0:00:02 --:--:-- 15.9M

"status": "success",
"metrics": {"elapsedTime": "2.148354775s","executionTime": "2.148323137s","resultCount": 31591,"resultSize": 36754457}
}

real 0m2.223s
user 0m1.977s
sys 0m0.141s


Enhancements to Dynamic Object Construction

N1QL already supports construction of JSON objects dynamically in queries. This immensely helps in creating specifically constructed result objects in query projection-lists. Couchbase server 4.5.1 extends the power of expressions and enriches the dynamic object creation and object processing in N1QL queries. 

  1. In Couchbase Server 4.5.1, N1QL allows both names and values of object fields to be arbitrary expressions. In earlier versions, the names of fields are required to be static strings. If a name does not evaluate to a string, the result of the object construction is NULL. For example:
    SELECT { UPPER(callsign) || "_key" : callsign || ":" || country || ":" || name } AS myobj 
    FROM `travel-sample`
    WHERE type = 'airline' limit 1;
    [
      {
        "myobj": {
          "MILE-AIR_key": "MILE-AIR:United States:40-Mile Air"
        }
      }
    ]
  2. When constructing objects in a N1QL query, the names of fields in name-value pairs is made optional in 4.5.1. For example, the following query implicitly assigns names "type" and "name" for respective values: 
    SELECT {type, name} AS myobj FROM `travel-sample` LIMIT 1;
    [
      {
        "myobj": {
          "type": "airport"
          "name": "airport_123"
        }
      }
    ]

New Array Function: ARRAY_INTERSECT() 

ARRAY_INTERSECT() function takes two or more arrays as parameters and returns the intersection of the input arrays as the result, i.e the array containing values that are present in all the input arrays. It returns an empty array if there are no common array elements. For more information, see documentation. For example, following query finds the hotels that are liked by Brian or Lilian, in the travel-sample bucket shipped with Couchbase Server 4.5.1.

SELECT meta().id, ARRAY_INTERSECT(public_likes, ["Brian Kilback", "Lilian McLaughlin"]) AS likes
FROM `travel-sample` WHERE type = 'hotel' 
ORDER BY likes DESC
LIMIT 4;
[
  {
    "id": "hotel_10025",
    "likes": [
      "Lilian McLaughlin",
      "Brian Kilback"
    ]
  },
  {
    "id": "hotel_10026",
    "likes": []
  },
  {
    "id": "hotel_10064",
    "likes": []
  },
  {
    "id": "hotel_10063",
    "likes": []
  }
]

  

Download Couchbase Server 4.5.1 and give it a try. Learn more at N1QL documentation. Let me know any questions/comments, or just how awesome it is! Cheers!

Align DevOps for your applications with DevOps for your SQL Server databases to increase speed of delivery and keep data safe. Discover true Database DevOps, brought to you in partnership with Redgate

Topics:
couchbase ,n1ql ,pattern matching ,database

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}