Over a million developers have joined DZone.

TokuDB vs. InnoDB in a Timeseries INSERT Benchmark

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

This post comes from at the MySQL Performance Blog.

This post is a continuation of my research into TokuDB’s storage engine to understand if it is suitable for timeseries workloads.

While inserting LOAD DATA INFILE into an empty table shows great results for TokuDB, it would be more interesting to see some realistic workloads.

So this time let’s take a look at the INSERT benchmark.

What I am going to do is insert data in 16 parallel threads into the table from the previous post:

CREATE TABLE `sensordata` (
  `ts` int(10) unsigned NOT NULL DEFAULT '0',
  `sensor_id` int(10) unsigned NOT NULL,
  `data1` double NOT NULL,
  `data2` double NOT NULL,
  `data3` double NOT NULL,
  `data4` double NOT NULL,
  `data5` double NOT NULL,
  `cnt` int(10) unsigned NOT NULL,
  PRIMARY KEY (`sensor_id`,`ts`)
)

The inserts are bulk inserts with sequentially increasing ts and with sensor_id from 1 to 1000.

While the inserts are not fully sequential, because the primary key is (sensor_id, ts), it is enough to have in memory workload, so I do not expect performance degradation when data exceeds memory. This will play in favor for InnoDB, as it is known that TokuDB performs worse in CPU-bound benchmarks.

The benchmark executes one million events, each event inserts 1,000 records in bulk. When it's finished we should have about one billion records in the table.

So let’s see how InnoDB (compressed 8K vs. not compressed) performs.
Throughput (more is better):
innodb-thr

Response time (log 10 scale on the Y axis) (less is better):
innodb_resp

So InnoDB comes with the following numbers:

  • InnoDB with no compression averages at 350 inserts per second with a response time of 80-100 ms per transaction. The final table size is 82 GB
  • InnoDB with 8K compression has a throughput of 130 inserts per second with a response time 250 ms. The final table size is 60 GB

Now, we have a pretty bad compression rate because I used uniform distribution for values of data1-data5 columns, and uniform may not be good for compression. And, actually, in the real case I expect much more repeating values, so I am going to re-test with pareto (zipfian) distribution.

For TokuDB (tested tokudb_fast and tokudb_small formats):

Throughput (more is better):
tokudb_thr

Response time (log 10 scale on the Y axis) (less is better):
tokudb_resp

TokuDB observations:

  • After an initial warmup, TokuDB shows quite inconsistent performance with both tokudb_fast and tokudb_small formats.
  • For tokudb_fast, the throughput is topping at ~150 inserts per second, and 95% response time at ~160 ms. However, there are periodical stalls when throughput drops almost to zero and response times jump to 10 seconds (!!!) per transaction.
  • For tokudb_small, the throughput is even less stable jumping around 100 inserts per second and response times start at 300 ms per transaction, with stalls up to 30 seconds per transaction

File sizes for TokuDB:

  • tokudb_fast: 50 GB
  • tokudb_small: 45 GB.

Again, I correspond a bad compression rate to uniform distribution. If we switch to pareto, the file size for tokudb_fast is 21 GB, and tokudb_small is 13 GB

If we zoom in to a 900 second time frame we can see the periodic behavior of TokuDB:
zoom

Now, I consider these stalls in TokuDB as severe and I do not think I can recommend using it in production under such workload conditions until the problem is fixed.

You can find the scripts for the timeseries benchmark for sysbench v0.5 here.

Software versions for InnoDB: Percona Server 5.6-RC3, for TokuDB: mariadb-5.5.30-tokudb-7.0.4

UPDATE (September 5, 2013):
By many requests I'm updating the post with following information:
TokuDB throughput (tokudb_small row format) with Pareto distribution, for two cases:

  • 1. PRIMARY KEY (sensor_id,ts) (on graph: tokudb_small)
  • 2. PRIMARY KEY (ts,sensor_id), KEY (sensor_id,ts) (on graph: tokudb_small_key)

Throughput in this case:
tokudb_key

We can see that the top throughput for tokudb_small_key is less than for tokudb_small, but there is also less variance in throughput.

The my.cnf Files

For InnoDB:

[mysqld]
# gdb
log-error=error.log
innodb_file_per_table = true
innodb_data_file_path = ibdata1:100M:autoextend
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 40G
innodb_buffer_pool_instances=1
innodb_file_format = Barracuda
innodb_checksum_algorithm = crc32
innodb_log_file_size = 4G
innodb_log_files_in_group = 2
#innodb_log_block_size=4096
#####plugin options
innodb_read_io_threads = 16
innodb_write_io_threads = 4
innodb_io_capacity = 4000
#not innodb options (fixed)
port = 3306
back_log = 50
max_connections = 2000
max_prepared_stmt_count=500000
max_connect_errors = 10
table_open_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
thread_cache_size = 1000
query_cache_size = 0
query_cache_type = 0
thread_stack = 192K
tmp_table_size = 64M
server-id = 10
key_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 8M
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
socket=/var/lib/mysql/mysql.sock
user=root

For TokuDB (pretty much default):

[mysqld]
gdb
skip-innodb
#not innodb options (fixed)
port = 3306
back_log = 50
max_connections = 2000
max_prepared_stmt_count=500000
max_connect_errors = 10
table_open_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
thread_cache_size = 1000
query_cache_size = 0
query_cache_type = 0
ft_min_word_len = 4
#default_table_type = InnoDB
thread_stack = 192K
tmp_table_size = 64M
server-id = 10
key_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 8M
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
#myisam_max_extra_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
socket=/var/lib/mysql/mysql.sock
user=root

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}