Over a million developers have joined DZone.

ANALYZE: MyISAM vs Innodb

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Following up on my Previous Post I decided to do little test to see how accurate stats we can get for for Index Stats created by ANALYZE TABLE for MyISAM and Innodb.

But before we go into that I wanted to highlight about using ANALYZE TABLE in production as some people seems to be thinking I advice to use it.... a lot. In fact I should say I see more systems which have ANALYZE abused - run too frequently without much need than systems which do not run ANALYZE frequently enough.

First it is worth to note MySQL only saves very basic cardinality information for index prefixes for index stats and these rarely change. There is no histograms or any other skew metrics etc. MySQL optimizer also uses number of rows in the table for many decisions but this is computed live (maintained for MyISAM and estimated during query execution for Innodb). This basic information means it does not change whole that quickly at extent to affect optimizer plans.

If you look at the stats accuracy along running ANALYZE TABLE after initial table population and when there are significant changes makes sense. For Innodb as index stats are computed first time table is accessed after restart this often means "never" because MySQL servers are restarted frequently enough. Even once per 3 months is often enough for many workloads. Add to this Innodb stats are less accurate by nature which means you can allow more data change while your
index stats remain as good as new.

Looking at stats accuracy is however a wrong way to look at the problem. Your index stats are a bit off, so what ? What really matters is not how accurate stats are but how good plans you're getting for your queries. If you're getting as good plans as with perfect stats why bother updating them ?
Also note many simple "queries" (using constants for index accesses) will not use index cardinality data at all but will estimate number of rows during query execution.

I typically look at ANALYZE TABLE and adding it to the table if I see having it run helps to get good plans. If query plans are good or bad independently of it being run there is need to bother - for bad plans use FORCE INDEX or change the query and report MySQL Optimizer Bug :)

But now lets see in the difference of behavior of ANALYZE TABLE for MyISAM vs Innodb.

I used the following simple table for tests:

CREATE TABLE `antest` (
`i` int(10) UNSIGNED NOT NULL,
`c` char(80) DEFAULT NULL,
KEY `i` (`i`),
KEY `c` (`c`,`i`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

I have populated it with data with following true cardinality:

mysql> SELECT count(DISTINCT c) FROM antest;
+-------------------+
| count(DISTINCT c)
+-------------------+
| 101
+-------------------+
1 row IN SET (0.36 sec)
mysql> SELECT count(DISTINCT i) FROM antest;
+-------------------+
| count(DISTINCT i)
+-------------------+
| 101
+-------------------+
1 row IN SET (0.20 sec)
mysql> SELECT count(DISTINCT i,c) FROM antest;
+---------------------+
| count(DISTINCT i,c)
+---------------------+
| 10201
+---------------------+
1 row IN SET (0.43 sec)

Lets see how stats look for MYISAM:

mysql> SHOW INDEX FROM antest;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest | 1 | i | 1 | i | A | NULL | NULL | NULL | | BTREE |
| antest | 1 | c | 1 | c | A | NULL | NULL | NULL | YES | BTREE |
| antest | 1 | c | 2 | i | A | NULL | NULL | NULL | | BTREE |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

Aha as you can see there is no cardinality stored with table as ANALYZE did not run yet.

mysql> SHOW INDEX FROM antest;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest | 1 | i | 1 | i | A | 101 | NULL | NULL | | BTREE |
| antest | 1 | c | 1 | c | A | 101 | NULL | NULL | YES | BTREE |
| antest | 1 | c | 2 | i | A | 10240 | NULL | NULL | | BTREE |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.01 sec)

As you can see after running ANALYZE we have exact cardinality for i and c columns, with cardinality for the pair (c,i) looks a bit off but is within 0.5% of the correct value so we can count on MyISAM values as almost exact.

As you see ANALYZE table tool a little bit of time to run (even for this very small table) this is because ANALYZE does index scans to find number of exact values in the table.

Now let us populate antest_innodb table which is same but uses Innodb format:

mysql> INSERT INTO antest_innodb SELECT  * FROM antest;
Query OK, 245760 rows affected (54.29 sec)
Records: 245760 Duplicates: 0 Warnings: 0
mysql> SHOW INDEX FROM antest_innodb;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest_innodb | 1 | i | 1 | i | A | 245900 | NULL | NULL | | BTREE |
| antest_innodb | 1 | c | 1 | c | A | 245900 | NULL | NULL | YES | BTREE |
| antest_innodb | 1 | c | 2 | i | A | 245900 | NULL | NULL | | BTREE |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

Very interesting result - after loading the data with INSERT in Innodb table we do not get NULL cardinality as with MyISAM but instead we get very wrong cardinality which shows us index prefix is unique (245900 is estimate for the row count in the table)

It is worth to note if you do ALTER TABLE Innodb, same as MyISAM will internally run analyze as soon as table is rebuilt and values will be more sensible:

mysql> ALTER TABLE antest_innodb type=innodb;
Query OK, 245760 rows affected, 1 warning (51.87 sec)
Records: 245760 Duplicates: 0 Warnings: 0
mysql> SHOW INDEX FROM antest_innodb;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest_innodb | 1 | i | 1 | i | A | 332 | NULL | NULL | | BTREE |
| antest_innodb | 1 | c | 1 | c | A | 18 | NULL | NULL | YES | BTREE |
| antest_innodb | 1 | c | 2 | i | A | 20491 | NULL | NULL | | BTREE |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

Note however how much are these values off from reality. The "i" key cardinality is overestimated 3 times, "c" key prefix cardinality is underestimated 5 times and the combined (c,i) key cardinality is overestimated 2 times. So Innodb stats are are very inexact. Fortunately for most queries which use these stats accuracy at the order of magnitude is enough. Sometimes it is not and you're thinking why a hell it could be picking this strange plan.

Let us run ANALYZE TABLE for Innodb couple of more times to see how values change:

mysql> analyze TABLE antest_innodb;
+--------------------+---------+----------+----------+
| TABLE | Op | Msg_type | Msg_text
+--------------------+---------+----------+----------+
| test.antest_innodb | analyze | STATUS | OK
+--------------------+---------+----------+----------+
1 row IN SET (0.00 sec)
mysql> SHOW INDEX FROM antest_innodb;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest_innodb | 1 | i | 1 | i | A | 338 | NULL | NULL | | BTREE |
| antest_innodb | 1 | c | 1 | c | A | 18 | NULL | NULL | YES | BTREE |
| antest_innodb | 1 | c | 2 | i | A | 20491 | NULL | NULL | | BTREE |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)
mysql> analyze TABLE antest_innodb;
+--------------------+---------+----------+----------+
| TABLE | Op | Msg_type | Msg_text
+--------------------+---------+----------+----------+
| test.antest_innodb | analyze | STATUS | OK
+--------------------+---------+----------+----------+
1 row IN SET (0.00 sec)
mysql> SHOW INDEX FROM antest_innodb;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest_innodb | 1 | i | 1 | i | A | 92 | NULL | NULL | | BTREE |
| antest_innodb | 1 | c | 1 | c | A | 384 | NULL | NULL | YES | BTREE |
| antest_innodb | 1 | c | 2 | i | A | 20491 | NULL | NULL | | BTREE |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

As we see subsequent runs change stats dramatically. For c prefix we got value changed to become 15 times larger. So Innodb stats are both inexact and unstable. So restarting server with Innodb may change stats dramatically and affect some query plans. You also may be getting different plans on different slaves with same data.

Another difference when it comes from handling the statistics comes from NULL handling.
MyISAM has a special variable which controls if NULLs should be considered equal when computing stats:

mysql> SHOW VARIABLES LIKE "myisam_stats_method";
+---------------------+---------------+
| Variable_name | Value
+---------------------+---------------+
| myisam_stats_method | nulls_unequal
+---------------------+---------------+
1 row IN SET (0.00 sec)

Too see the difference let me set column "c" to NULL in both tables and see how values change:

mysql> UPDATE antest SET c=NULL;
Query OK, 245760 rows affected (11.48 sec)
Rows matched: 245760  Changed: 245760  Warnings: 0
mysql> UPDATE antest_innodb SET c=NULL;
Query OK, 245760 rows affected (1 min 20.19 sec)
Rows matched: 245760  Changed: 245760  Warnings: 0
mysql> SHOW INDEX FROM antest;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment 
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest |		  1 | i		        |			1 | i		   | A		 |		 101 |	 NULL | NULL   |	  | BTREE	  |		 
| antest |		  1 | c		|			1 | c		   | A		 |	  245760 |	 NULL | NULL   | YES  | BTREE	  |		 
| antest |		  1 | c		|			2 | i		   | A		 |	  245760 |	 NULL | NULL   |	  | BTREE	  |		 
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)
mysql> analyze TABLE antest_innodb;
+--------------------+---------+----------+----------+
| TABLE			  | Op	  | Msg_type | Msg_text 
+--------------------+---------+----------+----------+
| test.antest_innodb | analyze | STATUS   | OK	   
+--------------------+---------+----------+----------+
1 row IN SET (0.01 sec)
mysql> SHOW INDEX FROM antest_innodb;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE		 | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment 
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest_innodb |		  1 | i		        |			1 | i		   | A		 |		 418 |	 NULL | NULL   |	  | BTREE	  |		 
| antest_innodb |		  1 | c		|			1 | c		   | A		 |		   8 |	 NULL | NULL   | YES  | BTREE	  |		 
| antest_innodb |		  1 | c		|			2 | i		   | A		 |		 196 |	 NULL | NULL   |	  | BTREE	  |		 
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

As you can see MyISAM set cardinality for prefix (c) and key(c,i) approximately to number of rows in the table treating all nulls different values. Innodb on the contrary treats all NULL values the same so
cardinality for (c) and (c,i) dropped significantly.

This means Innodb and MyISAM have different stats computation method by default.

Lets check how stats change for MyISAM if we change the stats computation method:

mysql> SET myisam_stats_method='nulls_equal';
Query OK, 0 rows affected (0.00 sec)
mysql> analyze TABLE antest;
+-------------+---------+----------+-----------------------------+
| TABLE | Op | Msg_type | Msg_text
+-------------+---------+----------+-----------------------------+
| test.antest | analyze | STATUS | TABLE IS already up TO date
+-------------+---------+----------+-----------------------------+
1 row IN SET (0.00 sec)

oops. Little gotcha. MySQL considers table up to date even though stats stored were computed with different method. If your table is written to actively you should not have this problem; I just did couple of updates to refresh update time.

mysql> SHOW INDEX FROM antest;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TABLE | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| antest | 1 | i | 1 | i | A | 101 | NULL | NULL | | BTREE |
| antest | 1 | c | 1 | c | A | 1 | NULL | NULL | YES | BTREE |
| antest | 1 | c | 2 | i | A | 101 | NULL | NULL | | BTREE |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows IN SET (0.00 sec)

So with nulls_equal method we see very different picture. It is considered we only have one distinct value for "c" and there are 101 distict values for (c,i) which is the same as value of distinct values in i column. These stats look much closer to what we get for Innodb table with same data though we can see Innodb stats are a bit off from reality too.

MySQL version note: This is from MySQL 5.0.62 if there are other versions which show different behavior.

Original Author

Original Article Written By Peter Zaitsev

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Schalk Neethling. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}