Data compression in InnoDB for text and blob fields
Join the DZone community and get the full member experience.Join For Free
this post is from michael coburn at the mysql performance blog.
have you wanted to compress only certain types of columns in a table while leaving other columns uncompressed? while working on a customer case this week i saw an interesting problem where a table had many heavily utilized text fields with some read queries exceeding 500mb (!!), and stored in a 100gb table. in this case we were not allowed to make any query or application logic changes so we chose to implement the barracuda file format and utilize compressed rows as this appealed to me for this mostly-read application. one quick way you can see if your rows will benefit from compression would be to read peter zaitsev’s blog post and execute:
select avg(length((`coltextfield`)) from `t1` where `id` < 1000
compare this to:
select avg(length(compress(`coltextfield`))) from `t1` where `id` < 1000
in our case we saw about a 75% reduction when the text field was compressed which we felt indicated there would be a benefit derived from table compression.
with the original innodb antelope file format you have the choice of row_format=compact and row_format=redundant where innodb stored the first 768 bytes of variable length columns (blob, varchar, text) in the index record, and the remainder stored in overflow pages. compact became the default after mysql 5.0.3 and has a more compact representation for nulls and variable-length fields than redundant .
using innodb’s new barracuda file format (available since innodb plugin 1.1 or mysql 5.5) you can now leverage table compression by specifying row_format=compressed . in our case we only wanted mysql to try to move the larger (greater than 16kb) text fields off-page so we utilized the key_block_size=16 directive. this means that each text / blob field that exceeds 16kb it would be stored in it’s own page (less the 20 byte pointer stored in the index page). based on our analysis 75% of the blobs stored in the table were over 8kb, which were responsible for 90% of space usage hence compressing only externally stored blobs provided substaintial advantages. why did we choose a key_block_size that is the same value of the innodb page size of 16kb? as the fine mysql manual states:
this setting may still be useful for tables with many long
columns, because such values often do compress well, and might therefore require fewer “overflow” pages.
i did not test with a smaller key_block_size as we had minimal time to effect the table compression modification (given the long run-time of the alter table), you may find your application benefits from a different key_block_size value. also note that you need to enable the dynamic variable innodb_file_format=barracuda (don’t forget to set it in my.cnf!):
set global innodb_file_format=barracuda;
one caveat: you must be running with innodb_file_per_table=1 as the innodb system tablespace cannot be compressed, see this page for further details on how to enable compression for a table .
to utilize barracuda format tables you will need to create them new and migrate data, or affect existing tables with an alter table statement. as table compression is table specific, row_format and key_block_size directives are passed via create table or alter table statements. in our case, we chose to re-build the table using alter table via a null-operation like this:
alter table `t1` engine=innodb row_format=compressed key_block_size=16;
in our case even though the customer had a 3ghz 24-core machine the alter table was progressing slowly as it was bound to a single cpu while compressing the data. just have patience. keep in mind too that if you started with a 100gb table and assuming you know your approximate compression rate, you will be left with a considerably smaller on-disk footprint so ideally you will be able to postpone that purchase of additional disk capacity.
so what was the real-world outcome of this exercise? we were able to show a 70% improvement in queries against this table when the text fields were not part of the query request due to barracuda not storing 768 bytes of the blob on field, and reduce the table down to 30gb. happy customer
one parting idea: you may be able to leverage pt-online-schema-change from percona toolkit 2.1 in order to modify the table if you cannot sustain the blocking effects of a traditional alter table statement.
i hope this helps you understand a use case where table compression can be beneficial when your workload is mostly-read. thanks for reading my first mysqlperformanceblog.com blog post!
Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.