When Optimal MySQL Isn't Best
When Optimal MySQL Isn't Best
When it comes to MySQL query construction, what happens when ''optimal'' isn't ''best?'' See how one developer is trying to overcome that roadblock.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
The MySQL optimal query plan strategy isn’t always the best option. In this post, I’ll show you how to help MySQL choose the best strategy for maximum performance.
As an example, I will discuss a real life use case when we needed to select some amount of the latest news (let’s say 25 news articles) by tags. Suppose we have 1 billion records in the news table and 10,000 records in table tags. The natural relation between news and tags will be many-to-many using bridge table tags_in_news as shown below:
And the SQL query will look like:
select n.* from news n, tags_in_news tin where 'getNewsByTags' = 'getNewsByTags' and n.deleted = false and n.id = tin.news_id and tin.deleted = false and tin.uk_tag_id in (0, 1119, 2228, 4031, 1792, 1117, 3954, 2150, 3919, 4087) and n.modified_date <= Now() order by n.modified_date desc limit 25
After the first run of this query, its execution time is not as fast as I would like it to be. It's because of the order by sorting construction. So, I created index for the modified_date field:
create index ix_modified_date on news (modified_date)
But even after that, MySQL did not choose the query plan strategy using this index — it chose the optimal strategy using another index, which was not the best strategy. So I had to force MySQL to use this index. Fortunately, we can use force index construction, and now our query will look like this:
select n.* from news n force index (ix_modified_date), tags_in_news tin where 'getNewsByTags' = 'getNewsByTags' and n.deleted = false and n.id = tin.news_id and tin.deleted = false and tin.uk_tag_id in (0, 1119, 2228, 4031, 1792, 1117, 3954, 2150, 3919, 4087) and n.modified_date <= Now() order by n.modified_date desc limit 25
Bingo! Query time decreased from more than a second to less than 50 ms. I discovered the best query execution strategy. So the problem was resolved... Until one “fine” morning when our DB server fell down. After a slow_query_log investigation, I found out that for some tag values, our query executes dramatically slowly.
And the cause was really strange: the total number of records returned from the query was less than the limit value. In such cases, it’s better not to use force index and let MySQL decide how to execute the query. Hmm... looks like the optimal strategy is better than the best strategy? So, I had two options: to give up and use the optimal strategy or to fight and find a solution to get rid of this issue. I’ve chosen the second one. But more on it in my next article. Stay tuned!
Published at DZone with permission of Ivan Shulak . See the original article here.
Opinions expressed by DZone contributors are their own.