Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Look at MariaDB's Subquery Cache

DZone's Guide to

A Look at MariaDB's Subquery Cache

MariaDB's subquery cache is not a cache for subqueries. Instead, it can be treated as an optimizer, its tables surviving for the duration of a statement.

· Database Zone
Free Resource

Download the Guide to Open Source Database Selection: MySQL vs. MariaDB and see how the side-by-side comparison of must-have features will ease the journey. Brought to you in partnership with MariaDB.

MariaDB Subquery CacheThe MariaDB subquery cache feature added in MariaDB 5.3 is not widely known. Let’s see what it is and how it works.

What Is a Subquery Cache?

The MariaDB subquery cache optimizes the execution of correlated subqueries. Correlated subqueries refer to a value from the parent query. For example:

SELECT id FROM product WHERE price NOT IN (SELECT MAX(price) FROM product GROUP BY category);


MariaDB only uses this optimization if the parent query is a SELECT, not an UPDATE or a DELETE. The subquery results get cached only for the duration of the parent query.

MariaDB added the subquery cache in v5.3. It is controlled by optimizer_switch, a dynamic variable that contains many flags that enable or disable several optimizations. To disable the subquery cache, run:

You can also do this at the session level.

How Does Subquery Cache Work?

Let’s see how it works. To make things clearer, we will use an example. Consider these tables:

Now, we issue this query:

The server decides to read t1 first (the bigger table, as expected), and then access t2 using the subquery cache. It creates a MEMORY temporary table to store the results of the subquery, with an index on c (it is used to match the rows). Then it reads the first row from t1, and checks if the search is cached. It is not, so it reads t1 looking for rows with c=1 and copies the results into the cache. The next time it will find the value 1, and it will not need to access t2 because the matches are already cached. If you look at the data, you may notice that the value “5” appears twice in t1 (and is absent in t2). But the search is cached anyway, so the server searches for 5 in t2 only once.

I hope that you aren’t blindly accepting what I wrote until now: good DBAs need facts and metrics. Let’s be scientific: we’ll make a prediction, conduct an experiment and check the status variables to verify the prediction. The prediction is the list of rows that will be read from t1, with the running total of hits and misses:

(1,2) -- Miss: 1
(3,4) -- Miss: 2
(1,2) -- Hit:  1
(3,4) -- Hit:  2
(3,4) -- Hit:  3
(3,5) -- Hit:  4
(3,5) -- Hit:  5
(5,1) -- Miss: 3
(5,2) -- Hit:  6
(3,6) -- Hit:  7
(1,5) -- Hit:  8


MariaDB [test]> SHOW STATUS LIKE "subquery_cache%";
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| Subquery_cache_hit | 8 |
| Subquery_cache_miss | 3 |
+---------------------+-------+
2 rows in set (0.00 sec)
MariaDB [test]> SHOW STATUS LIKE "handler_tmp_write";
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Handler_tmp_write | 3 |
+-------------------+-------+
1 row in set (0.00 sec)


The totals match, and the number of writes to the cache is equal to the misses (after a miss, a table access is done and cached).

The maximum size of an individual table is the minimum of tmp_table_size and max_heap_table_size. If the table size grows over this limit, the table is written to disk. If the MEMORY table creation fails (perhaps because MEMORY does not support BLOB), the subquery is not cached.

The total of hits and misses can be seen by reading two status variables: subquery_cache_hit and subquery_cache_miss. After 200 misses, the server checks the hit ratio for that particular subquery. If it is < 20%, it disables the cache for that subquery. If the hit rate is < 70% the table cannot be written to disk in case it exceeds the size limit. These numbers (200, 0.2, 0.7) are hardcoded and cannot be changed. But if you really want to test how MariaDB behaves with different parameters, you can change these constants in sql/sql_expression_cache.cc and recompile the server.

Isn’t This Subquery Materialization?

Subquery materialization is another strategy that the optimizer can choose to execute a query. It might look similar, because some data from a subquery are written to a MEMORY table, but this is the only similarity. The purpose and implementation of subquery materialization is different.

Let’s try to explain this with pseudocode.

Subquery materialization is for uncorrelated IN subqueries. Therefore the subquery is executed and materialized altogether:

The subquery cache is for correlated subqueries. Thus the subquery gets executed only for non-cached values:

# Query to optimize:
SELECT col1, (SELECT ... WHERE ... = col1) ... FROM ...
foreach (outer query row) {
if (col1 current value is cached) {
read from cache
} else {
read from subquery
cache col1 current value
}
}


Some Considerations

Despite the similarity in names, the MariaDB subquery cache is not a query cache for subqueries. These features are different, implemented for different purposes. Obviously, the subquery cache doesn’t have the scalability and performance problems of the query cache (global mutex, table invalidation, etc.). As mentioned, a subquery cache table only survives for the duration of a statement, so it should be considered an optimizer strategy. For example, in some cases, you might use the subquery cache for a WHERE … NOT IN subquery, but not for the WHERE … IN version, because the optimizer prefers to rewrite it as a JOIN.

Of course, not all correlated subqueries automatically benefit from this feature. Consider the example above: it is built to show that the subquery cache is useful. But we can easily build an example to show that can have a negative impact on performance: add rows to t1, and delete all duplicate values of a. There will be no hits, a temporary table is created, 200 reads and writes are performed, but it won’t help. After 200 misses, the cache will be disabled, yes, but what if this happens for each subquery? The damage may not be huge in a realistic case, but it is still damage. That’s why you can disable the MariaDB subquery cache.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.

Topics:
mariadb ,database ,subqueries ,database optimizer ,tutorial

Published at DZone with permission of Federico Razzoli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}