Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

EXPLAIN FORMAT=JSON Provides Insights on optimizer_switch Effectiveness

DZone's Guide to

EXPLAIN FORMAT=JSON Provides Insights on optimizer_switch Effectiveness

We can experiment with the value of optimizer_switch and use EXPLAIN FORMAT=JSON to examine how a particular optimization affects our queries. Read on to learn more.

· Database Zone
Free Resource

Learn how to create flexible schemas in a relational database using SQL for JSON.

Orignially written by Sveta Smirnova

The previous post in the EXPLAIN FORMAT=JSON is Cool! series showed an example of the query selectdept_name from departments where dept_no in (selectdept_no from dept_manager where to_date is not null), where the subquery was materialized into a temporary table and then joined with the outer query. This is known as a semi-join optimization. But, what happens if we turn off this optimization?

EXPLAIN FORMAT=JSON can help us with this investigation, too.

First, lets look at the original output again:

mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "16.72"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "departments",
          <skipped>
      },
      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "eq_ref",
          "key": "<auto_key>",
          "key_length": "4",
          "ref": [
            "employees.departments.dept_no"
          ],
          "rows_examined_per_scan": 1,
          "materialized_from_subquery": {
            "using_temporary_table": true,
            "query_block": {
              "table": {
                "table_name": "dept_manager",
                "access_type": "ALL",
                "possible_keys": [
                  "dept_no"
                ],
                "rows_examined_per_scan": 24,
                "rows_produced_per_join": 21,
                "filtered": "90.00",
                "cost_info": {
                  "read_cost": "1.48",
                  "eval_cost": "4.32",
                  "prefix_cost": "5.80",
                  "data_read_per_join": "345"
                },
                "used_columns": [
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"
              }
            }
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null))

To repeat what happened here: the subquery was materialized into a  temporary table, then  joined with the departments table. Semi-join optimization is ON by default (as would be most likely without intervention).

What happens if we temporarily turn semi-join optimization OFF?

mysql> set optimizer_switch="semijoin=off";
Query OK, 0 rows affected (0.00 sec)

And then execute EXPLAIN one more time:

mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null) G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.80"
    },
    "table": {
      "table_name": "departments",
      "access_type": "index",
      "key": "dept_name",
      "used_key_parts": [
        "dept_name"
      ],
      "key_length": "42",
      "rows_examined_per_scan": 9,
      "rows_produced_per_join": 9,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "1.80",
        "prefix_cost": "2.80",
        "data_read_per_join": "432"
      },
      "used_columns": [
        "dept_no",
        "dept_name"
      ],
      "attached_condition": "<in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))",
      "attached_subqueries": [
        {
          "table": {
            "table_name": "<materialized_subquery>",
            "access_type": "eq_ref",
            "key": "<auto_key>",
            "key_length": "4",
            "rows_examined_per_scan": 1,
            "materialized_from_subquery": {
              "using_temporary_table": true,
              "dependent": true,
              "cacheable": false,
              "query_block": {
                "select_id": 2,
                "cost_info": {
                  "query_cost": "5.80"
                },
                "table": {
                  "table_name": "dept_manager",
                  "access_type": "ALL",
                  "possible_keys": [
                    "dept_no"
                  ],
                  "rows_examined_per_scan": 24,
                  "rows_produced_per_join": 21,
                  "filtered": "90.00",
                  "cost_info": {
                    "read_cost": "1.48",
                    "eval_cost": "4.32",
                    "prefix_cost": "5.80",
                    "data_read_per_join": "345"
                  },
                  "used_columns": [
                    "dept_no",
                    "to_date"
                  ],
                  "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` where <in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))

Now the picture is completely different. There is no nested_loop member, and instead there is an attached_subqueries array containing a single member: the temporary table materialized from the subquery select dept_no from dept_manager where to_date is not null (including all the details of this materialization).

Conclusion: We can experiment with the value of optimizer_switch and use EXPLAIN FORMAT=JSON to examine how a particular optimization affects our queries.

Create flexible schemas using dynamic columns for semi-structured data. Learn how.

Topics:
percona ,database ,json

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}