Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Index Advisor for Couchbase N1QL Query Statement

DZone 's Guide to

Index Advisor for Couchbase N1QL Query Statement

Explore index advisor for a Couchbase N1QL query statement.

· Database Zone ·
Free Resource

Woman performing Couchbase N1QL query statement

Couchbase N1QL query statement

Overview

Index advisor is introduced in Couchbase server 6.5 as a developer preview feature. It targets at providing secondary index recommendation to help DBAs and developers optimize Couchbase N1QL query performance. This version is rule-based, and the index candidates will be generated following the design rules specified here.

  1. Leading array index key for unnest
  2. Equality predicates
  3. IN predicates
  4. Not less than/between/not greater than predicates
  5. Less than/greater than predicates
  6. Array predicates
  7. Derived join filter as leading key for left-hand-side keyspace
  8. IS NOT NULL/MISSING/VALUED predicates
  9. Functional predicates
  10. Partial index condition

Index advisor is designed to work in two ways:

  • N1QL Statement to advise indexes for a single query
  • Advisor function to advise on query workload and support session handling

This article will focus on the first function and share some insights on how it works internally.

You may also like: Create the Right Index, Get the Right Performance

Details of ADVISE Statement

N1QL Statement Syntax:

ADVISE [INDEX ] <Query>

It supports query types of SELECT/DELETE/UPDATE/ANSI MERGE and provides index recommendations for operations on a single keyspace, ANSI JOIN/NEST, INDEX JOIN/NEST, UNNEST, and subquery in the FROM clause.

The primary guidelines that Index Advisor follows are:

  1. Generate secondary index candidates based on indexable predicate, projection, GROUP BY/ORDER BY expressions with correct syntax.
  2. Sort the index keys following the rules w.r.t the priority order of the predicate types with adjustment on leading array index key for UNNEST and derived ISNOTNULL filter for ANSI JOIN.
  3. Make sure the recommended index is superior to all the current existing/deferred indexes and be selected by the current rule-based query optimizer in the query execution.

Let’s first take a look at the output of a simple example:

Query 1:

ADVISE SELECT name FROM `travel-sample` WHERE id < 100 AND type = “route”
[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "current_indexes": [
            {
              "index_statement": "CREATE INDEX def_type ON `travel-sample`(`type`)"
            }
          ],
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_id_type_name ON `travel-sample`(`id`,`name`) WHERE `type` = 'route'"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_id_type ON `travel-sample`(`id`) WHERE `type` = 'route'",
                "recommending_rule": "Index keys follow order of predicate types: 5. less than/between/greater than, 10. flavor for partial index."
              }
            ]
          }
        }
      ]
    },
    "query": "select name from `travel-sample` where id < 100 and type = \"route\";"
  }
]

There are two parts in the advice output:

Current Indexes:

This session provides information on the currently used indexes by the query optimizer to run the input query. It will also include the information on the status of the index if it is identical to the recommended index or if it is an optimal covering index. In both situations, no index recommendation will be provided.

Recommended Indexes:

  • Indexes:

This session lists the recommended indexes based on predicates in WHERE/ON clause, along with the corresponding recommendation rule that each index follows.

  • Covering Indexes:

This session lists the covering indexes applicable to the input query.

Partial Indexes

The Couchbase data model is JSON with flexible schema, and one bucket can be heterogeneous with multiple types of documents inside. Typically, data entries include a “type” field for categorization. When a query includes filters on a particular type of document, a partial index with the “type” field in the WHERE clause will be given. It can help minimize the index size, shorten the index access path, and execute the query more efficiently.

In the query example above, the recommended index is a partial index with “type = ‘route'” added in the WHERE clause, which will only create an index on the documents of type “route”:

CREATE INDEX adv_id_type_name ON `travel-sample`(`id`,`name`) WHERE `type` = 'route'

Internally, the index advisor uses the “flavor” statistics from the INFER statement to match the fields in the predicate and adds it to the partial index condition.

To get more ideas, let’s run a query with ANSI JOIN on multiple types of documents in which partial index with field “type” as index condition will be advised for each keyspace:

Query 2:

ADVISE SELECT DISTINCT airline.name,
                       airport.name AS airport,
                       route.distance
FROM `travel-sample` airport
INNER JOIN `travel-sample` route ON airport.faa = route.sourceairport
    AND route.type = "route"
INNER JOIN `travel-sample` airline ON route.airline = airline.iata
    AND airline.type = "airline"
WHERE airport.type = "airport"
    AND airport.city = "San Jose";
[
    {
        "#operator": "Advise",
        "advice": {
            "#operator": "IndexAdvice",
            "adviseinfo": [
                {
                    "recommended_indexes": {
                        "covering_indexes": [
                            {
                                "index_statement": "CREATE INDEX adv_iata_type_name ON `travel-sample`(`iata`,`name`) WHERE `type` = 'airline'"
                            },
                            {
                                "index_statement": "CREATE INDEX adv_city_type_faa_name ON `travel-sample`(`city`,`faa`,`name`) WHERE `type` = 'airport'"
                            },
                            {
                                "index_statement": "CREATE INDEX adv_sourceairport_type_airline_distance ON `travel-sample`(`sourceairport`,`airline`,`distance`) WHERE `type` = 'route'"
                            }
                        ],
                        "indexes": [
                            {
                                "index_statement": "CREATE INDEX adv_city_type ON `travel-sample`(`city`) WHERE `type` = 'airport'",
                                "recommending_rule": "Index keys follow order of predicate types: 2. equality, 10. flavor for partial index."
                            },
                            {
                                "index_statement": "CREATE INDEX adv_sourceairport_type ON `travel-sample`(`sourceairport`) WHERE `type` = 'route'",
                                "recommending_rule": "Index keys follow order of predicate types: 2. equality, 10. flavor for partial index."
                            },
                            {
                                "index_statement": "CREATE INDEX adv_iata_type ON `travel-sample`(`iata`) WHERE `type` = 'airline'",
                                "recommending_rule": "Index keys follow order of predicate types: 2. equality, 10. flavor for partial index."
                            }
                        ]
                    }
                }
            ]
        },
        "query": "SELECT DISTINCT airline.name, airport.name AS airport, route.distance FROM `travel-sample` airport INNER JOIN `travel-sample` route ON airport.faa = route.sourceairport AND route.type = \"route\" INNER JOIN `travel-sample` airline ON route.airline = airline.iata AND airline.type = \"airline\" WHERE airport.type = \"airport\" AND airport.city = \"San Jose\";"
    }
    ]

Array Index for Array Predicate

JSON is a hierarchical structure comprised of nested scalar fields, objects, and arrays. Array index key can go deep into the nested array and clearly reference the elements or the fields of the object needed to be indexed. Index advisor will recommend an array index by recursively traversing the bindings and expressions in the ANY range predicate.

Query 3:

ADVISE SELECT id
FROM `travel-sample`
WHERE type = "route"
    AND ANY x IN schedule,
             z IN schedule SATISFIES (ANY y IN x.day, w IN z.flight SATISFIES [y, w] <[1, "AF547"] END) END;
[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_DISTINCT_schedule_schedule_day_flight_type_id ON `travel-sample`(DISTINCT ARRAY (DISTINCT ARRAY [y, w] FOR y in x.day, w in z.flight END) FOR x in schedule, z in schedule END,`id`) WHERE `type` = 'route'"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_DISTINCT_schedule_schedule_day_flight_type ON `travel-sample`(DISTINCT ARRAY (DISTINCT ARRAY [y, w] FOR y in x.day, w in z.flight END) FOR x in schedule, z in schedule END) WHERE `type` = 'route'",
                "recommending_rule": "Index keys follow order of predicate types: 6. array predicate, 10. flavor for partial index."
              }
            ]
          }
        }
      ]
    },
    "query": "SELECT id FROM `travel-sample` WHERE type = \"route\" AND ANY x IN schedule, z IN schedule SATISFIES (ANY y IN x.day, w IN z.flight SATISFIES [y, w] <[1, \"AF547\"] END) END;"
  }
]

Since users have the flexibility to apply any additional logic and processing on the elements and fields of the object in the array to generate an ANY expression, the index advisor will also make efforts to support those complex functions, array expressions, boolean conditions and etc. as shown in the query below:

Query 4:

ADVISE SELECT *
FROM products AS c
WHERE ANY p IN OBJECT_PAIRS(c.productIds) 
            SATISFIES [p.name, ARRAY_COUNT(p.val) > 0, c.metadata.configurations.[p.name].enabled, ARRAY_CONTAINS(IFMISSINGORNULL(c.deletedFor, []),p.name)] = [\"US\", TRUE, FALSE, FALSE] END
[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "recommended_indexes": {
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_DISTINCT_object_pairs_productIds_name_array_count_val_metada4251025607 ON `products`(DISTINCT ARRAY [p.name, 0 < array_count((`p`.`val`)), metadata.configurations.[p.name].enabled, array_contains(ifmissingornull((`deletedFor`), []), (`p`.`name`))] FOR p in object_pairs((`productIds`)) END)",
                "recommending_rule": "Index keys follow order of predicate types: 6. array predicate."
              }
            ]
          }
        }
      ]
    },
    "query": "SELECT * FROM products AS c WHERE ANY p IN OBJECT_PAIRS(c.productIds) SATISFIES [p.name, ARRAY_COUNT(p.val) > 0, c.metadata.configurations.[p.name].enabled, ARRAY_CONTAINS(IFMISSINGORNULL(c.deletedFor, []),p.name)] = [\"US\", TRUE, FALSE, FALSE] END;"
  }
]

Array Index for UNNEST Operation

UNNEST is used to perform a join operation of the nested array with its parent object. When the predicate in this operation applies to the individual elements in the nested array, an array index will help to optimize the execution of the query.

The N1QL query optimizer supports the unnest scan and unnest covering scan when there is an appropriate leading array key with ALL array elements indexed in the index definition. Index advisor follows this rule to collect unnest expressions and generate an array index reversely for recommendation. Let’s take a look at the example below:

Query 5:

ADVISE SELECT COUNT(*)
FROM `travel-sample` t
UNNEST schedule AS x
UNNEST x.special_flights AS y
WHERE y.flight IS NOT NULL
    AND t.type = "route"
[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_ALL_schedule_special_flights_flight_type ON `travel-sample`(ALL ARRAY (ALL ARRAY y.flight FOR y IN x.special_flights END) FOR x IN schedule END,`type`)"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_ALL_schedule_special_flights_flight_type ON `travel-sample`(ALL ARRAY (ALL ARRAY y.flight FOR y IN x.special_flights END) FOR x IN schedule END,`type`)",
                "recommending_rule": "Index keys follow order of predicate types: 1. leading array index for unnest, 2. equality."
              }
            ]
          }
        }
      ]
    },
    "query": "SELECT COUNT(*)\nFROM `travel-sample` t\nUNNEST schedule AS x\nUNNEST x.special_flights AS y\nWHERE y.flight IS NOT NULL\n    AND t.type = \"route\""
  }
]

Covering Indexes

Covering indexes can provide all data required for a particular query and avoid the overhead in fetching documents from a data service. Index advisor will advise this efficient covering index for the input query if applicable in the following steps:

  • First, collect index keys from all the predicates in WHERE/ON clause.
  • Then, append the remaining projection and GROUP BY/ORDER BY expressions to the index definition and generate a tentative covering index candidate.
  • Last but not least, check if the index expressions can provide all required data to cover the query using the same approach as the query optimizer.

The previous query examples have shown that covering indexes and covering array indexes are provided for single keyspace query, JOIN operation, ANY expressions, and UNNEST predicates.

Index Naming Convention

In the Index advisor, the index name is designed to meet three requirements:

  1. Reflect the index construction by connecting all the index keys with an underscore
  2. Differentiate from user-defined indexes by adding “adv-” prefix
  3. Reasonable length by truncating and replacing with hash code

The formats for regular index and array index are listed below:

  • adv_field1_field2_field3…
  • adv_[DISTINCT/ALL]_level1_level2_level3…

Duplicate names are not supported in Couchbase indexing, and the index advisor doesn’t guarantee the uniqueness of the index name in the recommendation. Users need to change the index name upon encountering this error.

Summary

The index advisor (ADVISE statement) provides index recommendations for a single query. It advises regular index, partial index, array index, and covering index and provides information on the corresponding recommendation rule that each index key follows. Moreover, it evaluates the currently used indexes by one query and avoids recommending unnecessarily.

Released as a DP feature, the index advisor is still in its initial stages. The functionality and usability will be further enhanced to better meet customer requirements and contribute to query optimization.

Further Reading

Indexing Best Practices

MongoDB: Investigate Queries With explain() and Index Usage (Part 2)

Topics:
database ,index advisor ,couchbase n1ql ,query statement ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}