DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. 40 Million Tables in MySQL 8.0 With ZFS

40 Million Tables in MySQL 8.0 With ZFS

In this post, let's learn how to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). Let's also see some challenges.

Alexander Rubin user avatar by
Alexander Rubin
·
Sep. 19, 18 · Tutorial
Like (2)
Save
Tweet
Share
5.77K Views

Join the DZone community and get the full member experience.

Join For Free

In my previous post about millions of tables in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this post, I'm showing how to do it and what challenges we can expect.

Background

Once again, why do we need so many tables in MySQL, what is the use case? The main reason is customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or "database" in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

  1. Too many files. For each table, MySQL creates an FRM file. With MySQL 8.0, this is not the case for InnoDB tables (new data dictionary): it does not create FRM files, only creates IBD file.
  2. Too much storage overhead. Just to create 40 million tables we will need to have ~4 - 5 Tb of space. The ZFS filesystem can help here a lot, through compression - see below.
  3. MySQL does not work well with so many tables. We have observed a lot of overhead (MySQL needs to open/close table definition files) and contention (table definitions needs to be stored in memory to avoid performance penalty, which introduce mutex contention)

Challenges

When I approached the task of creating 40 million tables, my first challenge was disk space. Just to create them, I needed at least 5Tb of fast disk storage. The good news is that we have the ZFS filesystem that provides compression out of the box. With compression, I was able to use just a 250G drive with ZFS — the compression ratio is > 10x:

# du -sh --apparent-size /var/lib/mysql-data
4.7T    /var/lib/mysql-data
# du -sh /var/lib/mysql-data
131G    /var/lib/mysql-data

The second challenge is how to create those tables in a reasonable amount of time. I created a script to "provision" the databases (create all 40 millions tables). The good news is that the performance regression in "create table" speed and scalability bug was fixed so I was able to use this script to create 40 million tables using shared tablespaces (one tablespace per schema):

#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi;
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done

40 Million Tables in MySQL 8

Now it's time for a real test. I'm using the latest MySQL 8 version (at the time of writing): 8.0.12. This implements the new data dictionary.

MySQL config file:
[mysqld]
datadir=/var/lib/mysql-data
socket=/var/lib/mysql-data/mysql.sock
datadir=/var/lib/mysql-data
log-error = /var/lib/mysql-log/error.log
server_id = 12345
log_bin = /var/lib/mysql-log/binlog
relay_log=/var/lib/mysql-log/relay-bin
skip-log-bin=1
innodb_log_group_home_dir = /var/lib/mysql-log
innodb_doublewrite = 0
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=2G
innodb_buffer_pool_size=4G
tablespace_definition_cache = 524288
schema_definition_cache = 524288
table_definition_cache = 524288
table_open_cache=524288
open-files-limit=1000000
Sysbench Shell script:
function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt
}
let db_count=400000
table_count=100
max_time=10000
num_threads=32
run_sb
Sysbench Lua script:
pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"
if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end
function thread_init(thread_id)
   set_vars()
end
function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   oltp_db_count = tonumber(oltp_db_count) or 1
   -- local oltp_db_count = 4
   table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
      rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")
end

Please note that the tables are empty — no data.

Now we can run the benchmark. Unfortunately, we have a serious mutex contention in the data dictionary. Here are the results:

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00
[ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00

As we can see, for ~15 seconds no queries were processed: a complete MySQL stall. That situation — complete stall — happens constantly, every ~25-30 seconds.

Show engine InnoDB status query shows mutex contention:

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 498635
--Thread 140456572004096 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451898689280 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451896919808 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456571119360 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457044215552 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456572299008 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457043035904 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456571709184 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451897214720 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451896624896 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457042740992 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451899279104 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457042446080 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
OS WAIT ARRAY INFO: signal count 89024
RW-shared spins 11216, rounds 14847, OS waits 3641

I've filed a new MySQL bug: DICT_SYS mutex contention causes complete stall when running with 40 mill tables.

I've also tested with Pareto distribution in sysbench and even set the ratio to 0.05 (5%) and 0.01 (1%), and mutex contention is still an issue. I have used the following updated sysbench script:

function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --rand-type=$rand_type --rand-pareto-h=$pareto_h --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time $test_name run | tee -a sysbench_2.txt
}
let db_count=400000
table_count=100
max_time=10000
num_threads=32
rand_type="pareto"
pareto_h=0.01
test_name="./select_custom.lua"
echo "Now running $rand_type for $max_time seconds, test=$test_name"
run_sb

And the results with 0.01 (1%) are the following:

[ 55s ] thds: 32 tps: 72465.29 qps: 72465.29 (r/w/o: 72465.29/0.00/0.00) lat (ms,95%): 0.53 err/s: 0.00 reconn/s: 0.00
[ 56s ] thds: 32 tps: 68641.04 qps: 68641.04 (r/w/o: 68641.04/0.00/0.00) lat (ms,95%): 0.61 err/s: 0.00 reconn/s: 0.00
[ 57s ] thds: 32 tps: 70479.82 qps: 70479.82 (r/w/o: 70479.82/0.00/0.00) lat (ms,95%): 0.57 err/s: 0.00 reconn/s: 0.00
[ 58s ] thds: 32 tps: 31395.55 qps: 31395.55 (r/w/o: 31395.55/0.00/0.00) lat (ms,95%): 0.49 err/s: 0.00 reconn/s: 0.00
[ 59s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 61s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 62s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 63s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 64s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 65s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 66s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 67s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 68s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 69s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 71s ] thds: 32 tps: 18879.04 qps: 18879.04 (r/w/o: 18879.04/0.00/0.00) lat (ms,95%): 0.75 err/s: 0.00 reconn/s: 0.00
[ 72s ] thds: 32 tps: 70924.82 qps: 70924.82 (r/w/o: 70924.82/0.00/0.00) lat (ms,95%): 0.48 err/s: 0.00 reconn/s: 0.00
[ 73s ] thds: 32 tps: 72395.57 qps: 72395.57 (r/w/o: 72395.57/0.00/0.00) lat (ms,95%): 0.47 err/s: 0.00 reconn/s: 0.00
[ 74s ] thds: 32 tps: 72483.22 qps: 72484.22 (r/w/o: 72484.22/0.00/0.00) lat (ms,95%): 0.58 err/s: 0.00 reconn/s: 0.00

ZFS

The ZFS filesystem provides compression, which helps tremendously in this case. When MySQL creates an InnoDB table, it will create a new blank .ibd file and pre-allocate some pages, which will be blank. I have configured ZFS compression and can see > 10x compression ratio:

# zfs get all | grep compressratio
mysqldata                       compressratio         12.47x                      -
mysqldata                       refcompressratio      1.00x                       -
mysqldata/mysql                 compressratio         12.47x                      -
mysqldata/mysql                 refcompressratio      1.00x                       -
mysqldata/mysql/data            compressratio         12.51x                      -
mysqldata/mysql/data            refcompressratio      12.54x                      -
mysqldata/mysql/log             compressratio         2.79x                       -
mysqldata/mysql/log             refcompressratio      4.57x                       -

Conclusion

It is possible to create 40 million tables with MySQL 8.0 using shared tablespaces. ZFS provides an excellent compression ratio (with gzip), which can help by reducing the overhead of "schema per customer" architecture. Unfortunately, the new data dictionary in MySQL 8.0.12 suffers from the DICT_SYS mutex contention and causes constant "stalls."

Database MySQL

Published at DZone with permission of Alexander Rubin, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DevOps Roadmap for 2022
  • ChatGPT: The Unexpected API Test Automation Help
  • Visual Network Mapping Your K8s Clusters To Assess Performance
  • The Top 3 Challenges Facing Engineering Leaders Today—And How to Overcome Them

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: