DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Top 10 Pillars of Zero Trust Networks
  • Building the World's Most Resilient To-Do List Application With Node.js, K8s, and Distributed SQL
  • 5 Common Data Structures and Algorithms Used in Machine Learning

Trending

  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Top 10 Pillars of Zero Trust Networks
  • Building the World's Most Resilient To-Do List Application With Node.js, K8s, and Distributed SQL
  • 5 Common Data Structures and Algorithms Used in Machine Learning
  1. DZone
  2. Data Engineering
  3. Databases
  4. MySQL Compression: Compressed and Uncompressed Data Size

MySQL Compression: Compressed and Uncompressed Data Size

Peter Zaitsev user avatar by
Peter Zaitsev
·
Oct. 20, 14 · Interview
Like (0)
Save
Tweet
Share
6.41K Views

Join the DZone community and get the full member experience.

Join For Free

MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:

TokuDB:

mysql>select*from information_schema.tableswhere table_schema='test'G
***************************1.row***************************
TABLE_CATALOG:def
TABLE_SCHEMA:test
TABLE_NAME:comp
TABLE_TYPE:BASE TABLE
ENGINE:TokuDB
VERSION:10
ROW_FORMAT:tokudb_zlib
TABLE_ROWS:40960
AVG_ROW_LENGTH:10003
DATA_LENGTH:409722880
MAX_DATA_LENGTH:9223372036854775807
INDEX_LENGTH:0
DATA_FREE:421888
AUTO_INCREMENT:NULL
CREATE_TIME:2014-10-1007:59:05
UPDATE_TIME:2014-10-1008:01:20
CHECK_TIME:NULL
TABLE_COLLATION:latin1_swedish_ci
CHECKSUM:NULL
CREATE_OPTIONS:
TABLE_COMMENT:
1row inset(0.00sec)

Archive:

mysql>select*from information_schema.tableswhere table_schema='test'G
***************************1.row***************************
TABLE_CATALOG:def
TABLE_SCHEMA:test
TABLE_NAME:comp
TABLE_TYPE:BASE TABLE
ENGINE:ARCHIVE
VERSION:10
ROW_FORMAT:Compressed
TABLE_ROWS:40960
AVG_ROW_LENGTH:12
DATA_LENGTH:501651
MAX_DATA_LENGTH:0
INDEX_LENGTH:0
DATA_FREE:0
AUTO_INCREMENT:NULL
CREATE_TIME:NULL
UPDATE_TIME:2014-10-1008:08:24
CHECK_TIME:NULL
TABLE_COLLATION:latin1_swedish_ci
CHECKSUM:NULL
CREATE_OPTIONS:
TABLE_COMMENT:
1row inset(0.01sec)

InnoDB:

mysql>select*from information_schema.tableswhere table_schema='test'G
***************************1.row***************************
TABLE_CATALOG:def
TABLE_SCHEMA:test
TABLE_NAME:comp
TABLE_TYPE:BASE TABLE
ENGINE:InnoDB
VERSION:10
ROW_FORMAT:Compressed
TABLE_ROWS:40660
AVG_ROW_LENGTH:4168
DATA_LENGTH:169480192
MAX_DATA_LENGTH:0
INDEX_LENGTH:0
DATA_FREE:1572864
AUTO_INCREMENT:NULL
CREATE_TIME:2014-10-1008:33:22
UPDATE_TIME:NULL
CHECK_TIME:NULL
TABLE_COLLATION:latin1_swedish_ci
CHECKSUM:NULL
CREATE_OPTIONS:row_format=COMPRESSED KEY_BLOCK_SIZE=4
TABLE_COMMENT:
1row inset(0.00sec)

From this we can see what Archive and Innodb show the COMPRESSED values for DATA_LENGTH and AVG_ROW_LENGTH while TokuDB shows uncompressed one (as of TokuDB 7.5.1) shipped with Percona Server 5.6.21-69.

The problem here is not only a lack of consistency but also what we need to know about BOTH numbers. We often need to know the uncompressed value to understand how much data there is really in the table, but also how much space it takes on the disk. Comparing these also can help us to understand the compression ratio (or on the contrary expansion due to storage overhead and indexes).

Looking at Information_Schema tables available I can’t find any way to find how much uncompressed data is stored in the Innodb (or Archive) table. The simple trick I can use is running the query along those lines: SELECT SUM(LENGTH(col1)+LENGTH(col2)…) FROM T – This would slightly overestimate the length converting numbers and dates to strings but it is good enough for most purposes.

TokuDB though, while providing uncompressed information in Information_schema TABLES table, allows you to get the information of real data storage on disk from its own information schema tables:

mysql>select*from information_schema.TokuDB_fractal_tree_infowhere table_schema='test'andtable_name='comp'limit10G
***************************1.row***************************
dictionary_name:./test/comp-main
internal_file_name:./_test_sql_147e_292e_main_2c20c08_1_1b_B_0.tokudb
bt_num_blocks_allocated:125
bt_num_blocks_in_use:125
bt_size_allocated:1880088
bt_size_in_use:1502232
table_schema:test
table_name:comp
table_dictionary_name:main
***************************2.row***************************
dictionary_name:./test/comp-status
internal_file_name:./_test_sql_147e_292e_status_2c20bdd_1_1b.tokudb
bt_num_blocks_allocated:4
bt_num_blocks_in_use:4
bt_size_allocated:8704
bt_size_in_use:512
table_schema:test
table_name:comp
table_dictionary_name:status
2rows inset(0.01sec)

This shows us that this table is really consisting of 2 files each “bt_size_allocated” bytes in length. These numbers are close to what you will see on the disk but not exactly. I see these files are taking 1886208 and 16384 files, respectfully. I wish there would be an exact length available to query so we do not have to think how much difference there is and if it can get large enough in some cases to care.

If you’re just looking for information about how much space has been allocated and how much is currently used for given TokuDB table you can use a query like this:

mysql>select sum(bt_size_allocated)total,sum(bt_size_in_use)used,sum(bt_size_allocated)-sum(bt_size_in_use)free from information_schema.TokuDB_fractal_tree_
+---------+---------+--------+
|total|used|free|
+---------+---------+--------+
|1888792|1502744|386048|
+---------+---------+--------+
1row inset(0.01sec)

To Sum it up – there is some consistency to improve in terms of reporting compressed and uncompressed data length information in MySQL – both in terms of consistency and information available. It is great to see that TokuDB found a way to report both compressed and uncompressed data size information, yet I would really love to see the actual size on the disk that a given table is taking. And it would be great if there was some consistent way to query it from inside MySQL without having to go to the file-system level and dealing with different ways that different storage engines place data on the file system. This becomes especially important with the coming of full tablespace support in MySQL 5.7 which would make it hard to find all matching files for the table on the filesystem.


Data (computing) MySQL Database

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Top 10 Pillars of Zero Trust Networks
  • Building the World's Most Resilient To-Do List Application With Node.js, K8s, and Distributed SQL
  • 5 Common Data Structures and Algorithms Used in Machine Learning

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: