Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

N1QL Index Advisor: Improve Query Performance and Productivity

DZone 's Guide to

N1QL Index Advisor: Improve Query Performance and Productivity

A guide.

· Performance Zone ·
Free Resource

As application developers, your main focus is to design the most efficient queries for your application. You want to ensure the integrity of the data that your application pushes to the database, and construct the most efficient queries to retrieve the data needed for the application. In many cases, the application performance is deferred to a later stage of the development cycle. Often it is left to the database performance specialists who can provide the recommendations for the index design, which may also include query rewrite, in order to meet the performance requirement. But understanding the query’s performance can greatly improve productivity and help developers to seek a more performant solution early in the development cycle.

Database vendors understand this challenge, and enterprise-class RDBMSs include Performance Tuning capability as part of their database toolsets.

Couchbase N1QL Index Advisor, a new feature in Couchbase 6.5 (DP), now provides recommendations for GSI indexes for N1QL query statements. Its primary objective is to recommend index to optimize the query response time. It allows developers to become more productive as they can create the recommended index(es) and see how the query performs early in the development cycle. It also allows DBAs to improve system performance by periodically reviewing slow running queries.

You can learn how to use Couchbase Index Advisor by reading the Understanding Index Advisor and Start Using Index Advisor sections below. The remaining sections provide more details on how the Index Advisor arrives at the recommendations, including tips on using Index Advisor on completed requests and how to identify infrequently used indexes.

One of Couchbase's top experts.

One of Couchbase's top experts.
You may also like: Synthetic Monitoring of DNS in Changing Times

Understanding Index Advisor

Database performance, while it is a difficult task to manage well, is a well-understood problem for database vendors. The advent of SaaS and DBasS also means that the database performance challenge is just part of the service, and needs to be automated with regard to its ability to diagnose and fix the problem. Our development team has been hard at work to create the building blocks towards that vision.

Couchbase Index Advisor is part of the N1QL ADVISE construct that allows developers and DBAs to manage the performance tuning aspect of their applications. To understand how the Index Advisor works, you need to understand the main data retrieval techniques in the Couchbase data platform:

  • Key-Value — A fast and direct data access for the Data service when the keys are available. This access method does not involve the Couchbase Indexing service.
  • Indexing Service (GSI) — A scalable service that enables applications to perform fast N1QL queries. GSI leverages a distributed architecture and stores the indexes on an independent set of nodes across the Couchbase Cluster.
  • Search (FTS) — A Full-Text Search service that provides keyword and fuzzy search. FTS, as with all Couchbase services, is a distributed service that can be set up across the Couchbase cluster.

An image of the Index Advisor.

N1QL Query leverages all of the above services to deliver the query result to the applications. However, Index Advisor in v6.5 (DP) only recommends indexes that are managed by the Index service (GSI). Index Advisor does not assess the Data service retrieval technique via USE KEYS, nor the Search(FTS) service via the N1QL SEARCH_QUERY() and SEARCH() function.

Start Using Index Advisor

As I had mentioned earlier, Index Advisor is the first in a series of a feature that we introduce for database performance tuning. Index Advisor helps to systematically identify indexes that can optimize the application performance, and ensure that indexes are created using the recommended best practice for key order. In effect, the Index Advisor can offer recommendations even when you already have working indexes for your queries.

You can start using N1QL Index Advisor using:

N1QL – ADVISE [INDEX] <query>

ADVISE  SELECT route.destinationairport
   FROM `travel-sample` airport    
     JOIN `travel-sample` route ON airport.faa = route.sourceairport    
         AND route.type = "route"
    WHERE airport.type = "airport"    
      AND airport.city = "San Francisco"    
      AND airport.country = "United States";
[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "current_indexes": [
            {
              "index_statement": "CREATE INDEX def_city ON `travel-sample`(`city`)"
            },
            {
              "index_statement": "CREATE INDEX def_type ON `travel-sample`(`type`)"
            },
            {
              "index_statement": "CREATE INDEX def_faa ON `travel-sample`(`faa`)"
            },
            {
              "index_statement": "CREATE INDEX def_route_src_dst_day ON `travel-sample`(`sourceairport`,`destinationairport`,(distinct (array (`v`.`day`) for `v` in `schedule` end))) WHERE (`type` = 'route')",
              "index_status": "THIS IS AN OPTIMAL COVERING INDEX."
            }
          ],
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_faa_country_city_type ON `travel-sample`(`faa`,`country`,`city`) WHERE `type` = 'airport'"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_faa_country_city_type ON `travel-sample`(`faa`,`country`,`city`) WHERE `type` = 'airport'",
                "recommending_rule": "Index keys follow order of predicate types: 1. equality, 8. flavor for partial index."
              }
            ]
          }
        }
      ]
    },
    "query": "SELECT route.destinationairport\nFROM `travel-sample` airport\n    JOIN `travel-sample` route ON airport.faa = route.sourceairport\n    AND route.type = \"route\"\nWHERE airport.type = \"airport\"\n    AND airport.city = \"San Francisco\"\n    AND airport.country = \"United States\";"
  }
]

Current Indexes

The section lists all the existing indexes that the query planner will use for this query. It also provides additional information on any index(es) that the query planner uses as covering index.

Recommended Indexes

This section lists all the indexes that are recommended by the Index Advisor. The covering index recommendations are those that would allow the query to work without additional fetching from the data service. Whereas the indexes section shows only those that help with the predicate clause. Please note that there could be duplication of information in these two sections, but where the index name would be identical where the index keys are the same.

It also provides the reason why it had recommended the indexes.

Use Advise from the Query Workbench

Click on the Advise button to obtain the Index Advisor output for the query.

A screenshot of the cluster1 query.

N1QL ADVISE supports query for SELECT, DELETE, UPDATE and MERGE. It provides index recommendations for:

  • Predicate WHERE clause.
  • Join condition ON clause for Index join, ANSI join, Index NEST, ANSI NEST and ANSI MERGE.
  • Array index for UNNEST, and array predicates in WHERE/ON clause.
  • Sub-Query when used in the FROM clause.

How Does Index Advisor Work

I will use the following approach to explain how the Index Advisor works.

  1. I will select an example that has the typical query constructs (predicate, join, array, group by), that can be difficult to guess what the best indexes are for the query.
  2. Examine the query plan to understand what the optimizer would do for the current query.
  3. Manually determine what indexes to create to improve query performance. This would be based on a typical application developer's understanding of predicate and join field indexes.
  4. Then run Couchbase Index Advisor and compare its recommendations to those we had in mind.
  5. Create the recommended indexes.
  6. Review the new query plan and assess the changes.

The Example Query

This query retrieves the number of routes by airlines from any airport in the city of San Jose, the USA that have scheduled flights on Sunday.

SELECT airline.name, airport.airportname, COUNT(1)
FROM `travel-sample` airport
INNER JOIN `travel-sample` route ON airport.faa = route.sourceairport
    AND route.type = "route"
INNER JOIN `travel-sample` airline ON route.airline = airline.iata
    AND airline.type = "airline"
WHERE airport.type = "airport"
    AND airport.city = "San Jose"
    AND airport.country = "United States"
    AND ANY x in route.schedule SATISFIES x.day =0 END
GROUP BY airline.name, airport.airportname

The Query Plan

  1. The query used five indexes
    1. Three indexes were used for the predicates on document type ‘airport’ – ‘city’, ‘type’, and ‘faa’.
    2. One index was used for the document type ‘route’ on the field ‘sourceairport’ for the purpose of the join between ‘route’ and ‘airport’. In this case, the existing index ‘def_route_src_dst_day’ was used because it has the leading key ‘sourceairport’.
    3. One index for document type ‘airline’, which is the default partial index for ‘type’.
  2. Two document NestedLoop joins – ‘airport’ — ‘route’ and ‘route’ — ‘airline’. But, other than the index ‘def_route_src_dst_day’ on ‘route’, there are no other indexes to support the rest of the joins. So these joins are not efficient.
  3. There was also a final aggregation to count the number of the route by the airline.
  4. The query completed in 2.1s

How Can We Improve the Query’s Performance

  1. The ‘airport’ document has two predicates (‘city’ and ‘country’) and a flavor predicate on ‘type’. In addition, the airport ‘faa’ column is also used for the join to ‘route’. So instead of using three separate indexes, there should be a partial index on the ‘airport’ document for (‘city’,’country’) [action_1]. The ‘airport.faa’ field could also benefit from a separate index or part of the main airport index.
  2. The ‘route’ document has no predicates, but it has a join to ‘airport’ with the field ‘sourceairport’. An index on this column should help with the join, and we already have the index ‘def_route_src_dst_day’ [action_2] so no action is needed for this join.
  3. The ‘route’ document also has a join to ‘airline’ on the field ‘airline’. So we should have an index for this – ‘route’ document (‘airline’) [action_3].
  4. The ‘airline’ document also doesn’t have any predicate, but it has a join to ‘route’ with the field ‘iata’. So another index is needed on the ‘airline’ document (‘iata’) [action_4].
  5. We have a predicate on the route.schedule array. So we need an index on this too. ‘route’ document (DISTINCT ARRAY x.day FOR x in schedule END) [action_5].

So we have come up with five action points, which result in four additional indexes for this query. Let’s see what the Index Advisor recommends.

Run Index Advisor to Check the Result

[
  {
    "#operator": "Advise",
    "advice": {
      "#operator": "IndexAdvice",
      "adviseinfo": [
        {
          "current_indexes": [
            {
              "index_statement": "CREATE INDEX def_city ON `travel-sample`(`city`)"
            },
            {
              "index_statement": "CREATE INDEX def_type ON `travel-sample`(`type`)"
            },
            {
              "index_statement": "CREATE INDEX def_faa ON `travel-sample`(`faa`)"
            },
            {
              "index_statement": "CREATE INDEX def_route_src_dst_day ON `travel-sample`(`sourceairport`,`destinationairport`,(distinct (array (`v`.`day`) for `v` in `schedule` end))) WHERE (`type` = 'route')"
            },
            {
              "index_statement": "CREATE INDEX def_type ON `travel-sample`(`type`)"
            }
          ],
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_faa_city_country_type_airportname ON `travel-sample`(`faa`,`city`,`country`,`airportname`) WHERE `type` = 'airport'"
              },
              {
                "index_statement": "CREATE INDEX adv_iata_type_name ON `travel-sample`(`iata`,`name`) WHERE `type` = 'airline'"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_country_faa_city_type ON `travel-sample`(`country`,`faa`,`city`) WHERE `type` = 'airport'",
                "recommending_rule": "Index keys follow order of predicate types: 1. equality, 8. flavor for partial index."
              },
              {
                "index_statement": "CREATE INDEX adv_airline_sourceairport_DISTINCT_schedule_day_type ON `travel-sample`(`airline`,`sourceairport`,DISTINCT ARRAY x.day FOR x in schedule END) WHERE `type` = 'route'",
                "recommending_rule": "Index keys follow order of predicate types: 1. equality, 5. array predicate, 8. flavor for partial index."
              },
              {
                "index_statement": "CREATE INDEX adv_iata_type ON `travel-sample`(`iata`) WHERE `type` = 'airline'",
                "recommending_rule": "Index keys follow order of predicate types: 1. equality, 8. flavor for partial index."
              }
            ]
          }
        }
      ]
    },
    "query": "SELECT airline.name, airport.airportname, COUNT(1)\nFROM `travel-sample` airport\nINNER JOIN `travel-sample` route ON airport.faa = route.sourceairport\n    AND route.type = \"route\"\nINNER JOIN `travel-sample` airline ON route.airline = airline.iata\n    AND airline.type = \"airline\"\nWHERE airport.type = \"airport\"\n    AND airport.city = \"San Jose\"\n    AND airport.country = \"United States\"\n    AND ANY x in route.schedule SATISFIES x.day =0 END\nGROUP BY airline.name, airport.airportname"
  }
]

Current Indexes Section

This is the same as what the plan had given us. The query uses five indexes.

Recommended Indexes Section — “Indexes”

This section lists out all the indexes from the predicate perspective. Note that JOIN has ON fields, which are treated as predicate too. So let’s review the recommended indexes.

  1. adv_country_faa_city_type ON travel-sample (countryfaacity) WHERE type = ‘airport’ – Other than the key order, this is the same index as we have identified in [action_1]. The key order is not significant here as these predicates all use equality.
  2. adv_airline_sourceairport_DISTINCT_schedule_day_type ON travel-sample (airlinesourceairport, DISTINCT ARRAY x.day FOR x in schedule END) WHERE type = ‘route’ – What Index Advisor has done here is that it has combined [action_2, action_3, action_5] into a single index.
  3. adv_iata_type ON travel-sample(iata) WHERE type = ‘airline’ – This is the same index that we had identified in [action_4]

Recommended Indexes Section — “Covering Indexes”

This section lists out all the covering indexes recommendations. Note that covering indexes are those that are used to satisfy a query on a document where the execution would not need to perform additional fetch from the data service.

  1. adv_faa_city_country_type_airportname_airportname ON travel-sample (countryfaacityairportname) WHERE type = ‘airport’ – Index Advisor has added the route.airportname field to the recommended index [adv_country_faa_city_type ON  travel-sample(countryfaacity)]. By doing this, the query service would not need to fetch the route document from the data service to project the route.airportname
  2. adv_iata_type_name_name ON travel-sample(iataname) WHERE type = ‘airline’ – Index Advisor has added the airline.name to the recommended index [adv_iata_type ON travel-sample(iata) WHERE type = ‘airline’]. By doing this, the query service would not need to fetch the airline document from the data service to project the airline.name

Note: The Covering index reduces the need for the query service to perform additional fetching from the data service, thus improves query performance. But user needs to make a decision on whether to create covering indexes or not based on the specific constraints, such as memory or storage of the user’s environment.

Create All Recommended Indexes and Review the New Query Plan

For this exercise, I have decided to create all the recommended indexes using covering indexes where applicable.

CREATE INDEX `adv_faa_city_type_country_airportname` ON `travel-sample`(`faa`,`city`,`airportname`) WHERE ((`type` = "airport") and (`country` = "United States"))

CREATE INDEX `adv_sourceairport_airline_DISTINCT_schedule_day_type` ON `travel-sample`(`sourceairport`,`airline`,(distinct (array (`x`.`day`) for `x` in `schedule` end))) WHERE (`type` = "route")

CREATE INDEX `adv_iata_type_name` ON `travel-sample`(`iata`,`name`) WHERE (`type` = "airline")

The Query Plan After Adding the Recommended Indexes

  1. The query now uses the three new indexes recommended by Index Advisor, instead of five indexes.
  2. Since we created covering indexes, the query service no longer needs to perform any additional fetching from the data service.
  3. The query now completed in 120ms. A 94% improvement.

Use the Index Advisor to Analyze the Workload

N1QL – ADVISOR( <query>)

Index Advisor can also be invoked with the ADVISOR() function. It allows users to:

  1. Pass in multiple queries. SELECT ADVISOR(“SELECT ..”,”SELECT ..”,”UPDATE..”);
  2. Pass a subquery that returns the actual N1QL statements. Because the query returns a JSON document structure, you will need to use the RAW keyword to ensure that only the actual N1QL statements are returned.

Use Advisor() to Identify the Slow Queries by Querying the Completed Requests.

/* Retrieve queries that ran longer then 1 secs */
SELECT <strong>ADVISOR</strong>((SELECT <strong>RAW</strong> statement FROM  system:completed_requests
WHERE trunc(str_to_duration(elapsedTime)/1000000000) > 1))

Identify Existing Inefficient Indexes

Even when you already have indexes for your query, running Index Advisor can help identifying cases where existing indexes may not be the best indexes, based in the Couchbase Index guidance.

Consider this example:

SELECT COUNT(1) FROM `travel-sample` t WHERE t.type = “airport” AND t.icao IS NOT NULL AND t.geo.alt < 300;

Your query may run without any issue with an index below:

CREATE INDEX ix1 ON `travel-sample`(`type`,`icao`,`geo`.`alt`)

The recommended index for the query would be:

CREATE INDEX adv_geo_alt_icao_type ON `travel-sample`(`geo`.`alt`,`icao`) WHERE `type` = ‘airport’

This is because the index should perform better if the keys follow the order of predicate type rule:

  • Rule 4. Less than/between/greater than.
  • Rule 6. Not null/not missing/valued.
  • Rule 8. Flavor for a partial index.

Recommended Index Naming Convention

Index Advisor recommends index with a specific naming convention,

  • adv_field1_field2_field3…

For array indexes, an extra underscore is added at the end of the array index to distinguish:

  • adv_[DISTINCT/ALL]_field1_field1.2_field1.2.3__field2_field3…

It is recommended that you retain the index naming as provided, as this would allow Index Advisor to recognize the indexes that it had previously recommended versus those that have been created by other means. In future updates, Index Advisor may opt to recommend replacing an index, but it would only do so if the index it seeks to replace was one that it had previously recommended.

Use Index Stats Last Known Scan Time

Your database may have many indexes for different query requirements over a period of time. But as indexes are accumulated in your database, the potential for duplicates or similarly created indexes do happen. This includes indexes that are no longer in use. You can query the value “last_known_scan_time” from the Index stats rest endpoint, to determine the statuses of your indexes, then decide if you should drop the index.

SELECT ARRAY {"index_name":a.name,"last_scan_time":millis_to_str(a.val.last_known_scan_time/1000000)} 
          FOR a IN OBJECT_PAIRS(results) END
FROM curl("http://<index_host>:9102/api/v1/stats",
     {"user":"Administrator:password"}) results

Results

Summary

From an application deployment perspective, the need to manage indexes does not stop at the end of the development process, but an ongoing exercise. In fact, index management should have a life cycle of its own. This is because the volume and shape of the data can change over time, which subsequently may require changes to the existing indexes. In some cases, the query construct may also need revising. Index advisor, therefore, is a critical part of the product lifecycle management.

A few key points to note about Index Advisor as of Couchbase 6.5 (DP)

  • Type of Index — Index Advisor only recommends GSI indexes. It does not recommend the primary index, nor the FTS index.
  • Rule-Based — In the current implementation, the Index Advisor recommends indexes based predicates, projection list and the key column order. In effect, it is based on the same rule that is used by the query planner. As with rule-based database optimization(RBO), Index Advisor does not take into account the statistics of the data distribution of the key columns. For this reason, the user should perform a performance evaluation with the recommended indexes before deploying them in a production environment.
  • Sub-Query — Index Advisor works with sub-query when the query is in the FROM clause only.
  • Multi-key array index — Index Advisor will only recommend a single key for an array index, even when the predicate references multiple array fields.
  • Partition Index — Index Advisor does not include the partitioning clause in the recommended index creation statements.
  • Existing Indexes — Index Advisor does not recommend the redesign of existing indexes, nor recommend dropping of existing indexes. Users should use the index last scan time to make the decision.

Resources

We would love to hear from you on how you liked the 6.5 features and how it’ll benefit your business going forward. Please share your feedback via the comments!


Further Reading

Introduction to Jepsen Testing at Couchbase

14 Ways in Which Cross-Browser Testing Ensures a Better UX

Topics:
performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}