Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Tech-Forum Friday: Index Usage with Functions

DZone's Guide to

Tech-Forum Friday: Index Usage with Functions

· Performance Zone
Free Resource

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

This article was orginally written by Nathan Parrish

Note: We’re starting this ”technical forum” in order to share hot topics and noteworthy discussions from the ClustrixDB support forum with our larger audience.

I just ran into this issue with a developer and thought it was blog-worthy.

Our UI was doing the following query:

SELECT node,
 READS,
 replicas,
 ranked_replicas,
 total_replicas
FROM  hotness_history
WHERE  `database` = 'wiki_test'
 AND `table` = 'revision'
 AND `index` = '__idx_revision__PRIMARY'
 AND Unix_timestamp(timestamp) = 1389210952;

At one point, we’d been doing timestamp = ‘2013-12-12 12:01’, but in order to avoid strange timezone issues, we converted to using integer UNIX timestamp throughout the UI code.  This lead us to unix_timestamp(timestamp) = 1389210952.

However, this performed terribly, taking as long as 5 seconds.  The explain shows us why:

mysql> explain SELECT  node, reads, replicas, ranked_replicas, total_replicas  FROM hotness_history  WHERE `database` = 'wiki_test' AND `table` = 'revision' AND `index` = '__idx_revision__PRIMARY' AND UNIX_TIMESTAMP(timestamp) = 1389210952;
+--------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+
| Operation  | Est. Cost  | Est. Rows  |
+--------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+
| filter (unix_timestamp(1.timestamp, @@time_zone) = param(0)) and (1.index = param(1)) and (1.table = param(2)) and (1.database = param(3)) | 1779469.38 | 2355174.90 |
|  index_scan 1 := hotness_history.__idx_hotness_history__PRIMARY  | 1570120.50 | 2616861.00 |
+--------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+

The fact that the index_scan line shows nothing after the table.representation means we’re not sarging (using an index), thus we are reading the entire table, then filtering for a match on the timestamp.

The problem is that we can’t use an index on a column when we’re applying a function on that column (same limitation exists in MySQL, btw).  However, we can pretty easily eliminate this obstacle by inverting the function call and moving it to the other side of the expression:  timestamp = from_unixtime(123234235).

mysql> SELECT node,
 ->  READS,
 ->  replicas,
 ->  ranked_replicas,
 ->  total_replicas
 -> FROM  hotness_history
 -> WHERE  `database` = 'wiki_test'
 ->  AND `table` = 'revision'
 ->  AND `index` = '__idx_revision__PRIMARY'
 ->  AND timestamp = From_unixtime(1389210952);
+------+-------+----------+-----------------+----------------+
| node | READS | replicas | ranked_replicas | total_replicas |
+------+-------+----------+-----------------+----------------+
|  1 |  2732 |  1 |  1 |  6 |
|  3 |  2758 |  1 |  1 |  6 |
|  5 |  2732 |  1 |  1 |  6 |
+------+-------+----------+-----------------+----------------+
3 rows in set (0.01 sec)

Note the explain now shows us sarging (the bit after the comma on the index_scan line), and  thus reading very few rows, hence execution time is now trivial:

mysql> explain SELECT node,
  READS,
  replicas,
  ranked_replicas,
  total_replicas
  FROM  hotness_history
  WHERE  `database` = 'wiki_test'
  AND `table` = 'revision'
  AND `index` = '__idx_revision__PRIMARY'
  AND timestamp = From_unixtime(1389210952);
+------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
| Operation  | Est. Cost | Est. Rows |
+------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
| filter (1.index = param(1)) and (1.table = param(2)) and (1.database = param(3))  |  111.85 |  147.20 |
|  index_scan 1 := hotness_history.__idx_hotness_history__PRIMARY, timestamp = from_unixtime(param(0), @@time_zone, #t) |  102.03 |  163.55 |
+------------------------------------------------------------------------------------------------------------------------+-----------+-----------+

It is worth noting that while moving the function call to be over the constant allows us to use the index, the function call over the constant does interfere with the planner’s ability to utilize statistics about the distribution of values in the index, or in other words, to properly cost the plan using that index.  Thus it is possible that when using a function call around the constant parameter, the planner will choose a different plan than if you had given a simple constant (i.e. performed that function in the application instead of the database).  Thus it is often best to apply functions on constants within the application; if this is not possible or desirable (in this case we keep it in the database in order to let the database sort out timezone conversion), you may wish to use planner hints to force index selection (FORCE or USE INDEX (index_name)).

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Lisa Schultz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}