Over a million developers have joined DZone.

Indexing 101: Optimizing MySQL Queries on a Single Table

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

[This article was written by Combaudon] 

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:


The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:


This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.


Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.


Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}