Over a million developers have joined DZone.

Considering TokuDB as an Engine for Timeseries Data

· Performance Zone

Discover 50 of the latest mobile performance statistics with the Ultimate Guide to Digital Experience Monitoring, brought to you in partnership with Catchpoint.

This post comes from  at the MySQL Performance Blog.

I am working on a customer’s system where the requirement is to store a lot of timeseries data from different sensors.

For performance reasons, we are going to use SSDs, and therefore there is a list of requirements for the architecture:

  • Provide a high insertion rate
  • Provide a good compression rate to store more data on expensive SSDs
  • The engine should be SSD-friendly (less writes per timeperiod to help with SSD wear)
  • Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking over these requirements I actually think that TokuDB might be a good fit for this task.

There are several aspects to consider. This time I want to compare TokuDB vs InnoDB on an initial load time and space consumption.

Let’s assume the schema is the following:

CREATE TABLE `sensordata` (
  `ts` int(10) unsigned NOT NULL DEFAULT '0',
  `sensor_id` int(10) unsigned NOT NULL,
  `data1` double NOT NULL,
  `data2` double NOT NULL,
  `data3` double NOT NULL,
  `data4` double NOT NULL,
  `data6` double NOT NULL,
  `cnt` int(10) unsigned NOT NULL,
  PRIMARY KEY (`sensor_id`,`ts`)
)

Where sensor_id is in a range from one to about 1,000, and ts is a monotonically increasing timestamp.

This schema exploits both a TokuDB and InnoDB clustering primary key, and all inserts are “almost” sequential, which guarantee that all inserts will not require disk access and work with data in memory.

The same for SELECTS – select queries on the most recent time periods will be executed only by a memory access.

I am doing this research on the Dell PowerEdge R420 box with 48GB of memory (40GB for InnoDB buffer pool size, and default memory allocation for TokuDB, which is 24GB for the TokuDB cache). The storage is a very fast PCI-e Flash card.

The test export CSV file, suitable for LOAD DATA INFILE is 40GB in size and contains over 1 billion records (exactly 1,238,201,948)

MySQL Versions:

  • For InnoDB tests I used Percona Server 5.6-RC2
  • For TokuDB tests I used mariadb-5.5.30-tokudb-7.0.4 from Tokutek website

So, first, let’s load data into InnoDB, again, using the LOAD DATA INFILE statement.

  • InnoDB, no compression. Load time is 1 hour 26 min 25.77 sec, final table size is 90GB
  • InnoDB, 8K compression. Load time 3 hours 26 min 17.06 sec, the table size is 45GB
  • InnoDB, 4K compression. Load time 17 hours 23 min 43.48 sec, the table size is 26GB

Now for TokuDB:

  • TokuDB, default compression. Load time 33 min 1.18 sec, the table size on disk is 10GB
  • TokuDB, tokudb_small table format. Load time 37 min 2.34 sec, the table size is 4.6GB

So TokuDB is the obvious leader in both load time and compression. Of course just these are not enough, and now we need to see the performance of further INSERT and SELECT queries. This is what I am running right now and will post the results when I have them.



Is your APM strategy broken? This ebook explores the latest in Gartner research to help you learn how to close the end-user experience gap in APM, brought to you in partnership with Catchpoint.

Topics:

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}