DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. A Look at MariaDB's Subquery Cache

A Look at MariaDB's Subquery Cache

MariaDB's subquery cache is not a cache for subqueries. Instead, it can be treated as an optimizer, its tables surviving for the duration of a statement.

Federico Razzoli user avatar by
Federico Razzoli
·
Mar. 06, 17 · Tutorial
Like (0)
Save
Tweet
Share
4.11K Views

Join the DZone community and get the full member experience.

Join For Free

MariaDB Subquery CacheThe MariaDB subquery cache feature added in MariaDB 5.3 is not widely known. Let’s see what it is and how it works.

What Is a Subquery Cache?

The MariaDB subquery cache optimizes the execution of correlated subqueries. Correlated subqueries refer to a value from the parent query. For example:

SELECT id FROM product WHERE price NOT IN (SELECT MAX(price) FROM product GROUP BY category);


MariaDB only uses this optimization if the parent query is a SELECT, not an UPDATE or a DELETE. The subquery results get cached only for the duration of the parent query.

MariaDB added the subquery cache in v5.3. It is controlled by optimizer_switch, a dynamic variable that contains many flags that enable or disable several optimizations. To disable the subquery cache, run:

SET GLOBAL optimizer_switch='subquery_cache=OFF';


You can also do this at the session level.

How Does Subquery Cache Work?

Let’s see how it works. To make things clearer, we will use an example. Consider these tables:

CREATE TABLE t1 (a INT, b INT);
INSERT INTO t1 VALUES
(1,2),
(3,4),
(1,2),
(3,4),
(3,4),
(3,5),
(3,5),
(5,1),
(5,2),
(3,6),
(1,5);
CREATE TABLE t2 (c INT, d INT);
INSERT INTO t2 VALUES
(1,10),
(2,20),
(3,30),
(4,40);


Now, we issue this query:

SELECT b, (SELECT d FROM t2 WHERE a = c) FROM t1;


The server decides to read t1 first (the bigger table, as expected), and then access t2 using the subquery cache. It creates a MEMORY temporary table to store the results of the subquery, with an index on c (it is used to match the rows). Then it reads the first row from t1, and checks if the search is cached. It is not, so it reads t1 looking for rows with c=1 and copies the results into the cache. The next time it will find the value 1, and it will not need to access t2 because the matches are already cached. If you look at the data, you may notice that the value “5” appears twice in t1 (and is absent in t2). But the search is cached anyway, so the server searches for 5 in t2 only once.

I hope that you aren’t blindly accepting what I wrote until now: good DBAs need facts and metrics. Let’s be scientific: we’ll make a prediction, conduct an experiment and check the status variables to verify the prediction. The prediction is the list of rows that will be read from t1, with the running total of hits and misses:

(1,2) -- Miss: 1
(3,4) -- Miss: 2
(1,2) -- Hit:  1
(3,4) -- Hit:  2
(3,4) -- Hit:  3
(3,5) -- Hit:  4
(3,5) -- Hit:  5
(5,1) -- Miss: 3
(5,2) -- Hit:  6
(3,6) -- Hit:  7
(1,5) -- Hit:  8


MariaDB [test]> SHOW STATUS LIKE "subquery_cache%";
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| Subquery_cache_hit | 8 |
| Subquery_cache_miss | 3 |
+---------------------+-------+
2 rows in set (0.00 sec)
MariaDB [test]> SHOW STATUS LIKE "handler_tmp_write";
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Handler_tmp_write | 3 |
+-------------------+-------+
1 row in set (0.00 sec)


The totals match, and the number of writes to the cache is equal to the misses (after a miss, a table access is done and cached).

The maximum size of an individual table is the minimum of tmp_table_size and max_heap_table_size. If the table size grows over this limit, the table is written to disk. If the MEMORY table creation fails (perhaps because MEMORY does not support BLOB), the subquery is not cached.

The total of hits and misses can be seen by reading two status variables: subquery_cache_hit and subquery_cache_miss. After 200 misses, the server checks the hit ratio for that particular subquery. If it is < 20%, it disables the cache for that subquery. If the hit rate is < 70% the table cannot be written to disk in case it exceeds the size limit. These numbers (200, 0.2, 0.7) are hardcoded and cannot be changed. But if you really want to test how MariaDB behaves with different parameters, you can change these constants in sql/sql_expression_cache.cc and recompile the server.

Isn’t This Subquery Materialization?

Subquery materialization is another strategy that the optimizer can choose to execute a query. It might look similar, because some data from a subquery are written to a MEMORY table, but this is the only similarity. The purpose and implementation of subquery materialization is different.

Let’s try to explain this with pseudocode.

Subquery materialization is for uncorrelated IN subqueries. Therefore the subquery is executed and materialized altogether:

# Query to optimize:
SELECT ... WHERE col1 IN (subquery)
materialize subquery into a MEMORY table with UNIQUE keys;
foreach (row in outer query) {
check if col1 current value exists in materialized table
}


The subquery cache is for correlated subqueries. Thus the subquery gets executed only for non-cached values:

# Query to optimize:
SELECT col1, (SELECT ... WHERE ... = col1) ... FROM ...
foreach (outer query row) {
if (col1 current value is cached) {
read from cache
} else {
read from subquery
cache col1 current value
}
}


Some Considerations

Despite the similarity in names, the MariaDB subquery cache is not a query cache for subqueries. These features are different, implemented for different purposes. Obviously, the subquery cache doesn’t have the scalability and performance problems of the query cache (global mutex, table invalidation, etc.). As mentioned, a subquery cache table only survives for the duration of a statement, so it should be considered an optimizer strategy. For example, in some cases, you might use the subquery cache for a WHERE … NOT IN subquery, but not for the WHERE … IN version, because the optimizer prefers to rewrite it as a JOIN.

Of course, not all correlated subqueries automatically benefit from this feature. Consider the example above: it is built to show that the subquery cache is useful. But we can easily build an example to show that can have a negative impact on performance: add rows to t1, and delete all duplicate values of a. There will be no hits, a temporary table is created, 200 reads and writes are performed, but it won’t help. After 200 misses, the cache will be disabled, yes, but what if this happens for each subquery? The damage may not be huge in a realistic case, but it is still damage. That’s why you can disable the MariaDB subquery cache.

Cache (computing) Database MariaDB

Published at DZone with permission of Federico Razzoli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • gRPC on the Client Side
  • Container Security: Don't Let Your Guard Down
  • Steel Threads Are a Technique That Will Make You a Better Engineer
  • Building a Real-Time App With Spring Boot, Cassandra, Pulsar, React, and Hilla

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: