DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Column Compression (With an Optional Predefined Dictionary)

Column Compression (With an Optional Predefined Dictionary)

This blog will demonstrate a test scenario for column compression with the helping hand of a predefined dictionary — something for the QA and bug hunters out there.

Shahriyar Rzayev user avatar by
Shahriyar Rzayev
·
Nov. 13, 16 · Tutorial
Like (0)
Save
Tweet
Share
3.22K Views

Join the DZone community and get the full member experience.

Join For Free

column compression

This blog discusses column compression with an optional predefined dictionary.

Compression, more compression with different algorithms, compress again, compress multiple times! Compression is a hot topic in our lives.

In general, testing new things is great if the processes are well-described and easy to follow. Let’s try to think like a QA engineer: the first golden rule of QA is “everything is buggy, life is full of bugs: good night bugs, good morning bugs, hello my old bug friends.”

The second golden rule of QA is “OK, now let’s find a way to catch a bug — but remember that your methods can be buggy, too.”

Remember: Always test! No bugs, no happiness!

When you start to test, the first goal is getting an idea of what is going on. This blog will demonstrate a test scenario for column compression with an optional predefined dictionary. For reference on column compression, read Compressed columns with dictionaries.”

To begin, let’s set up a basic environment:
The installation process requires installing Percona Server which is already documented here -> PS 5.6 installation

Secondly, find an already existing test: xtradb_compressed_columns_ibd_sizes.test.

Third, write a simple script to get started:

import mysql.connector
cnx = mysql.connector.connect(user='msandbox', password='msandbox',
                              host='127.0.0.1',
                              database='dbtest',
                              port=22896,
                              autocommit=True)
cursor = cnx.cursor()
crt_comp_dic = "CREATE COMPRESSION_DICTIONARY names2 ('Bartholomew')"
cursor.execute(crt_comp_dic)
table_t1 = "CREATE TABLE t1(id INT,a BLOB) ENGINE=InnoDB"
table_t2 = "CREATE TABLE t2(id INT,a BLOB COLUMN_FORMAT COMPRESSED) ENGINE=InnoDB"
table_t3 = "CREATE TABLE t3(id INT,a BLOB COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY names) ENGINE=InnoDB"
cursor.execute(table_t1);
cursor.execute(table_t2);
cursor.execute(table_t3);
insert_stmt = "insert into {} values({},repeat('Bartholomew', 128))"
for i in range(0, 100000):
	cursor.execute(insert_stmt.format('t1', int(i)))
	print insert_stmt.format('t1', int(i))
	cursor.execute(insert_stmt.format('t2', int(i)))
	print insert_stmt.format('t2', int(i))
	cursor.execute(insert_stmt.format('t3', int(i)))
	print insert_stmt.format('t3', int(i))
cursor.close()
cnx.close()


As you might notice, column compression might be with or without a compression dictionary. The visible difference, of course, is in the size of the tables. If you want to compress columns based on a predefined dictionary, you should create it with frequently used data. It is possible to create an empty dictionary, but it will have no effect. (See here: #1628231.)

The result of running this script is:

100.000 rows tables

t1 -> uncompressedt2 -> compressed column, t3 -> compressed column with compression dictionary, ‘names2’ dictionary

t2 -> compressed column,

t3 -> compressed column with compression dictionary, ‘names2’ dictionary.

Table size difference:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lth | grep .ibd
-rw-rw---- 1 sh sh 168M Sep 29 23:43 t1.ibd
-rw-rw---- 1 sh sh  15M Sep 29 23:43 t2.ibd
-rw-rw---- 1 sh sh  14M Sep 29 23:43 t3.ibd


After running an optimize table:
master [localhost] {msandbox} (dbtest) > optimize table t1;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t1 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t1 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (1 min 35.88 sec)
master [localhost] {msandbox} (dbtest) > optimize table t2;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t2 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t2 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (11.82 sec)
master [localhost] {msandbox} (dbtest) > optimize table t3;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t3 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t3 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (7.89 sec)


The resulted size:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lh | grep .ibd
-rw-rw---- 1 sh sh 160M Sep 29 23:52 t1.ibd
-rw-rw---- 1 sh sh 8.0M Sep 29 23:52 t2.ibd
-rw-rw---- 1 sh sh 7.0M Sep 29 23:52 t3.ibd


I want more:
master [localhost] {msandbox} (dbtest) > alter table t1 row_format=compressed;
Query OK, 0 rows affected (2 min 38.85 sec)
Records: 0  Duplicates: 0  Warnings: 0
master [localhost] {msandbox} (dbtest) > alter table t2 row_format=compressed;
Query OK, 0 rows affected (14.41 sec)
Records: 0  Duplicates: 0  Warnings: 0
master [localhost] {msandbox} (dbtest) > alter table t3 row_format=compressed;
Query OK, 0 rows affected (10.74 sec)
Records: 0  Duplicates: 0  Warnings: 0


Using  ROW_FORMAT=COMPRESSED requires innodb_file_format to be >  Antelope. But this is not true for COLUMN_FORMAT.

Again, check the size:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lh | grep .ibd
-rw-rw---- 1 sh sh  76M Sep 29 23:57 t1.ibd
-rw-rw---- 1 sh sh 4.0M Sep 29 23:58 t2.ibd
-rw-rw---- 1 sh sh 4.0M Sep 29 23:58 t3.ibd


Question: How do I get information about column compression dictionary and tables? Answer: tables from information_schema:

master [localhost] {msandbox} ((none)) > SELECT * FROM information_schema.xtradb_zip_dict;
+----+--------+-------------+
| id | name   | zip_dict    |
+----+--------+-------------+
|  1 | names  | Bartholomew |
|  2 | names2 | Bartholomew |
+----+--------+-------------+
2 rows in set (0.00 sec)
master [localhost] {msandbox} ((none)) > SELECT * FROM information_schema.xtradb_zip_dict_cols;
+----------+------------+---------+
| table_id | column_pos | dict_id |
+----------+------------+---------+
|       67 |          1 |       1 |
+----------+------------+---------+
1 row in set (0.00 sec)


Question: How do I drop the compression dictionary? Answer: if it is in use, you will get:
master [localhost] {msandbox} (dbtest) > drop COMPRESSION_DICTIONARY `names`;
ERROR 1894 (HY000): Compression dictionary 'names' is in use


Before dropping it, make sure there are no tables using the dictionary. There is an extreme condition where you are unable to drop the dictionary (see #1628824).
Database Dictionary (software)

Published at DZone with permission of Shahriyar Rzayev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Mind Map Reuse in Software Groups
  • Remote Debugging Dangers and Pitfalls
  • Taming Cloud Costs With Infracost
  • Hidden Classes in Java 15

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: