Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

InnoDB Full-Text Search in MySQL 5.6: Part 3, Performance

DZone's Guide to

InnoDB Full-Text Search in MySQL 5.6: Part 3, Performance

· Performance Zone
Free Resource

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

This post comes from  at the MySQL Performance Blog.

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  full_name VARCHAR(100),
  details TEXT
);

The table definition for the SEO data set is:

CREATE TABLE seo_test (
 id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title VARCHAR(255),
 body MEDIUMTEXT
);

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

Engine Data Set one-pass (load) two-pass (load, alter)
MyISAM SEO 3.91 3.96 (0.76, 3.20)
InnoDB SEO 3.777 7.32 (1.53, 5.79)
MyISAM DIR 43.159 44.93 (6.99, 37.94)
InnoDB DIR 330.76 56.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

Engine Data Set FT Index Create Time FT Index Drop Time
MyISAM SEO 6.34 3.17
InnoDB SEO 3.26 0.01
MyISAM DIR 74.96 37.82
InnoDB DIR 24.59 0.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3
   DESC LIMIT 5;
2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC
   LIMIT 5;
3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query # Engine Min. Execution Time Avg. Execution Time Max. Execution Time
1 MyISAM 0.007953 0.008102 0.008409
1 InnoDB 0.014986 0.015331 0.016243
2 MyISAM 0.001815 0.001893 0.001998
2 InnoDB 0.001987 0.002077 0.002156
3 MyISAM 0.000748 0.000817 0.000871
3 InnoDB 0.670110 0.676540 0.684837
4 MyISAM 0.001199 0.001283 0.001372
4 InnoDB 0.055479 0.056256 0.060985
5 MyISAM 0.008471 0.008597 0.008817
5 InnoDB 0.624305 0.630959 0.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST
('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;
+-------+----------------+-------------------+
| id    | full_name      | score             |
+-------+----------------+-------------------+
| 62633 | Thomas B Smith | 32.89915466308594 |
+-------+----------------+-------------------+
1 row in set (0.31 sec)
mysql [localhost] {msandbox} (test): show profile;
+-------------------------+----------+
| Status                  | Duration |
+-------------------------+----------+
| starting                | 0.000090 |
| checking permissions    | 0.000007 |
| Opening tables          | 0.000017 |
| init                    | 0.000034 |
| System lock             | 0.000012 |
| optimizing              | 0.000008 |
| statistics              | 0.000027 |
| preparing               | 0.000012 |
| FULLTEXT initialization | 0.304933 |
| executing               | 0.000008 |
| Sending data            | 0.000684 |
| end                     | 0.000006 |
| query end               | 0.000006 |
| closing tables          | 0.000011 |
| freeing items           | 0.000019 |
| cleaning up             | 0.000003 |
+-------------------------+----------+

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}