Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Index Advisor for Query Workload

DZone 's Guide to

Index Advisor for Query Workload

What is index advisor and how does it work?

· Database Zone ·
Free Resource

Image title

Let's work with index advisor for query workload.

Overview

As the second feature of Index Advisor released in Couchbase server 6.5 (Developer Preview), the Advisor function extends the scope from advising on a single query to providing index recommendations for query workload and support on session handling. In this article, we look at a brief review of how it works in these two different ways.

You may also like: Indexing Best Practices

Advisor Function On Workload

Advisor function works in the following steps:

  1. Take one single query or query workload consisting of multiple queries as input
  2. Process each individual unique query with index recommendation
  3. Merge and output the current/recommended indexes for the entire workload

Syntax:

SELECT ADVISOR("query")

SELECT ADVISOR(["query1", "query2", "query3″...])

The example below shows the scenario in which the query workload is from system:completed_requests:

SELECT ADVISOR((
    SELECT RAW statement
    FROM system:completed_requests));
[
  {
    "$1": {
      "current_used_indexes": [
        {
          "index": "CREATE INDEX def_type ON `travel-sample`(`type`)",
          "statements": [
            {
              "run_count": 2,
              "statement": "select * from `travel-sample` where type is not null"
            }
          ]
        },
        {
          "index": "CREATE PRIMARY INDEX def_primary ON `travel-sample`",
          "statements": [
            {
              "run_count": 1,
              "statement": "select id from `travel-sample` where id < 10"
            }
          ]
        }
      ],
      "recommended_covering_indexes": [
        {
          "index": "CREATE INDEX adv_id ON `travel-sample`(`id`)",
          "statements": [
            {
              "run_count": 1,
              "statement": "select id from `travel-sample` where id < 10"
            }
          ]
        }
      ],
      "recommended_indexes": [
        {
          "index": "CREATE INDEX adv_id ON `travel-sample`(`id`)",
          "statements": [
            {
              "run_count": 1,
              "statement": "select id from `travel-sample` where id < 10"
            }
          ]
        }
      ]
    }
  }
]

Similar to the ADVISE statement, the output of the ADVISOR() function consists of three parts:

  • Current_used_indexes
    • This session summarizes the current used indexes with the corresponding list of query statements and the run count of each query.
  • Recommended_indexes
    • This session provides the information on the recommended indexes based on predicates in WHERE/ON clause for the entire workload.
  • Recommended_covering_indexes
    • Covering index recommendation applying to the query workload will be listed here.

Advisor Function on Session Handling

The session handling function allows users to set up the index advisor in advance for the incoming query workload in the following steps.

  • Define the queries of interest with specific properties.
  • Initialize a session to monitor and collect the query workload running on a cluster for a period of time.
  • Invoke index advisor to perform analysis on the collected query workload asynchronously afterwards.
  • Extract and process the index recommendations after the completion of the entire process.

Syntaxes for usage:

1. ADVISOR( { “action” : “start”,

           “profile”: ”john”,

           “response” : “3s”,     

           “duration” : “1m” ,

           “query_count” : 200 } )

  • "action"— users to set for the session to proceed: "start", "stop", "get", "list", "purge", "abort".
  • "profile"(optional) — the specified user profile whose query will be of interest. Will collect the queries run by anyone if not set.
  • "response"(optional) — the time threshold set for the query to run longer than, default setting is 0s.
  • "duration"(mandatory) — how long for the session to be running to collect query for. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
  • "query_count"(optional) — The upper limit for the number of queries the session will collect. The query_limit is 4000 per query node. If users expect to collect more than that, they need to modify the global setting before initializing the session by:

                 "curl http://localhost:port/admin/settings -d '{"completed-limit":10000}' -u user:password"


Once the session has been initialized successfully, a unique session name will be returned for reference as shown below, at the same time an asynchronous task of invoking index advisor to do analysis will be scheduled to run immediately after the completion of collecting workload.

[
  {
    "$1": {
      "session": "06fcdefe-f864-48f5-a79b-b2f3345a6745"
    }
  }
]

2. ADVISOR( { "action" : "get",

                "session": "06fcdefe-f864-48f5-a79b-b2f3345a6745" } )

After the analysis process is completed, user can run another advisor() function to retrieve the index recommending results using the corresponding session name.

3. ADVISOR( { "action" : "purge",

                "session": "06fcdefe-f864-48f5-a79b-b2f3345a6745" } )

When the index recommending results for a particular session is not required any longer, user can choose to purge it for space efficiency.

The least recent session will also be flushed automatically once reaching the capacity of the cache.

4. ADVISOR({ “action”: “list”,

 “status”:”completed”})

This function can provide the list of currently active sessions, currently completed sessions, and all the existing sessions. The input for "status" can be one of "active", "completed", and "all".

  • class: this task comes from the index advisor
  • delay: how long the phase of collecting queries will last for
  • name: the session name is the same as the one above
  • state: the status of the session: ["completed", "schedule", "running"]
[
  {
    "$1": [
      {
        "tasks_cache": {
          "class": "advisor",
          "delay": "1m0s",
          "id": "ea673d76-086f-5f27-86b2-9b016f56f2a0",
          "name": "<span style="font-weight: 400;">06fcdefe-f864-48f5-a79b-b2f3345a6745</span>",
          "node": "127.0.0.1:8091",
          "results": [
            {}
          ],
          "startTime": "2019-07-23 16:11:13.616100068 -0700 PDT m=+96637.000928100",
          "state": "completed",
          "stopTime": "2019-07-23 16:11:14.186697742 -0700 PDT m=+96637.571525466",
          "subClass": "analyze",
          "submitTime": "2019-07-23 16:10:13.615964874 -0700 PDT m=+96577.000792768"
        }
      }
    ]
  }
]

The output information is from system:tasks_cache. Let's take an example to understand the relevant information regarding Index Advisor:

5. ADVISOR( { "action" : "stop",

                "session": "06fcdefe-f864-48f5-a79b-b2f3345a6745" } )

User can take an early stop on collecting queries and bring forward the index analysis to run immediately.



6. ADVISOR( { "action" : "abort",

                "session": "06fcdefe-f864-48f5-a79b-b2f3345a6745" } )

Users can also cancel one session without running analysis.

Summary

Advisor function works with a single query and multiple queries and provides information on the current used indexes, recommended indexes, and recommended covering indexes for the entire query workload.

Advisor session handling allows users to create a session to collect the queries, which meet the specified requirements for a period of time and invoke index advisor to provide recommendations for them.

As there's plenty of room for improvement on the functionality and performance for both the functions, we will continue to explore and take feedback to optimize the design in the next step.

Further Reading

Couchbase N1QL: To Query or To Analyze?

Create the Right Index, Get the Right Performance, Part 1

Topics:
database ,index advisor ,query workload ,advisor function ,couchbase

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}