Encryption: MySQL vs. PostgreSQL
Join the DZone community and get the full member experience.
Join For FreeFirst a note, all my tests involved a relatively simple table with a schema like this (column names did vary):
CREATE TABLE enctest ( id int, id_text text, id_enc bytea );
In MySQL, varbinary(64) was used instead of bytea.
The id was formed from a sequence from 1 to 100,000. I had more trouble loading this in MySQL than in PostgreSQL. id_text was a text cast of id, and id_enc was the value of id_text encrypted using 128-bit AES encryption. This was intended to mimic sales data consisting of short strings that would be decrypted and converted to numeric data before aggregation.
The goal was to see how fast the different implementations would decrypt all records and aggregate as numeric data types. For PostgreSQL, pgcrypto was used. The tests were conducted under ANSI mode on MySQL, and the tables were InnoDB.
What I found was remarkably disturbing. While MySQL was blazingly fast, this speed came at the cost of basic error checking and rather than an error, decrypting with the wrong key would give the wrong data back sometimes, even on traditional modes. This is because the errors instead of warnings, per the documentation, are only transformed on insert, not on select. In other words, MySQL is just as permissive in read operations with STRICT mode turned on as turned off.
mysql> select sum(cast(aes_decrypt(id_enc, sha2('secret', 512)) as decimal)) FROM enctest; +----------------------------------------------------------------+ | sum(cast(aes_decrypt(id_enc, sha2('secret', 512)) as decimal)) | +----------------------------------------------------------------+ | 5000050000 | +----------------------------------------------------------------+ 1 row in set (0.33 sec)
That is fast. Very fast, My similar query in PostgreSQL took about 200 seconds, so approximately 600x as long, and was entirely CPU-bound the whole time.
efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd')::numeric) from sumtest; QUERY PLAN -------------------------------------------------------------------------------- ----------------------------------------------- Aggregate (cost=7556.16..7556.17 rows=1 width=62) (actual time=217381.965..217 381.966 rows=1 loops=1) Output: sum((pgp_sym_decrypt(testvalsym, 'mysecretpasswd'::text))::numeric) Buffers: shared read=5556 written=4948 -> Seq Scan on public.sumtest (cost=0.00..6556.08 rows=100008 width=62) (ac tual time=0.015..1504.897 rows=100000 loops=1) Output: testval, testvaltext, testvalenc, testvalsym Buffers: shared read=5556 written=4948 Total runtime: 217382.010 ms (7 rows)
My first thought was that for there to be a three orders of magnitude difference between the two implementations, something must be seriously wrong on the PostgreSQL side. This is a huge difference. But then something occurred to me. What if I use the wrong password?
On PostgreSQL: efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest; ERROR: Wrong key or corrupt data
On MySQL, it is a very different story:
mysql> select sum(cast(aes_decrypt(id_enc, sha2('secret2', 512)) as decimal)) FROM enctest; +-----------------------------------------------------------------+ | sum(cast(aes_decrypt(id_enc, sha2('secret2', 512)) as decimal)) | +-----------------------------------------------------------------+ | 1456 | +-----------------------------------------------------------------+ 1 row in set, 6335 warnings (0.34 sec)
Hmmm, out of 100,000 rows, only 6,000 (6%) gave a warning, and we got a meaningless answer back. Thanks, MySQL. So I tried some others:
mysql> select sum(cast(aes_decrypt(id_enc, sha2('s', 512)) as decimal)) FROM enctest; +-----------------------------------------------------------+ | sum(cast(aes_decrypt(id_enc, sha2('s', 512)) as decimal)) | +-----------------------------------------------------------+ | 1284 | +-----------------------------------------------------------+ 1 row in set, 6230 warnings (0.35 sec)
Again, 6 percent warnings, meaningless answer returned. Wow, this is fun ...
Try as I might, I couldn't get MySQL to throw any errors, and I always got meaningless results back with the wrong key. A closer look would reveal that MySQL was throwing warnings only when certain rare criteria were met and was performing no validation on the data to ensure it matched the data in. Further review showed that the cryptograms were much shorter on MySQL than PostgreSQL, suggesting that PostgreSQL was padding short strings in order to ensure that cryptography would better protect the data. More on this later.
This suggested that the difference in the performance might well be related to extra sanity checks in PostgreSQL that MySQL omitted for speed-related purposes. Armed with this knowledge, I tried the following:
efftest=# update sumtest set testvalsym = pgp_sym_encrypt(testvaltext, 'mysecretpasswd', 's2k-mode=0, s2k-digest-algo=md5'); UPDATE 100000
The query returned pretty fast. However, these settings are not really recommended for production environments.
I went ahead and tried my data test and my performance queries again and the results were two orders of magnitude faster:
efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest; ERROR: Wrong key or corrupt data efftest=# update sumtest set testvalsym = pgp_sym_encrypt(testvaltext, 'mysecretpasswd', 's2k-mode=0, s2k-digest-algo=md5'); UPDATE 100000 efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest; ERROR: Wrong key or corrupt data efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd')::numeric) from sumtest; QUERY PLAN -------------------------------------------------------------------------------- ----------------------------------------------- Aggregate (cost=13111.00..13111.01 rows=1 width=71) (actual time=1996.574..199 6.575 rows=1 loops=1) Output: sum((pgp_sym_decrypt(testvalsym, 'mysecretpasswd'::text))::numeric) Buffers: shared hit=778 read=10333 -> Seq Scan on public.sumtest (cost=0.00..12111.00 rows=100000 width=71) (a ctual time=0.020..128.722 rows=100000 loops=1) Output: testval, testvaltext, testvalenc, testvalsym Buffers: shared hit=778 read=10333 Total runtime: 1996.617 ms (7 rows)
Much, much faster. Of course, that comes at the cost of security features.
The primary security features changed here are what are called string to key functions. PostgreSQL also offers some relatively complex containers for short data which include things like padding and session keys. MySQL does not provide string to key management, and requires that you generate the hexadecimal key yourself. PostgreSQL provides a number of options for string to key generation which allow for salted hashes to be used for the actual encryption.
One of the most obvious implications here is that with MySQL, you have to generate your salted hash yourself, while with PostgreSQL, it may generate a different salted hash for each line. This is very important for encryption, particularly with smaller strings because this helps thwart rainbow tables. In essence with salted keys, there is no 1:1 relationship between the passphrase/data combination and the cryptogram, because there is no 1:1 relationship between the passphrase and the key. Further testing suggests that this is not responsible for the performance difference but it does suggest there are more checks lurking beneath the surface that are omitted from MySQL.
So given that the issue is not string to key management, the issue must be padding. For very short strings, PostgreSQL is managing padding and containers, while MySQL is purely encrypting short strings without more than minimal padding. Since there is insufficient padding, the decryption routines are much faster, but this comes at a cost of any reasonable security. Additionally PostgreSQL provides data checks that are not done on MySQL.
So what does this tell us? I think the primary lesson that I have had driven home a few times is that database-level encryption is tricky. This is particularly true when other considerations are involved, like performance aggregating data over significant sets. Add to this the woes of in-database key management and the like and in-database encryption is definitely expert territory. In this regard, MySQL's approach seems to require a lot more complexity to maintain security than PostgreSQL's.
It is important to remember that short encrypted strings are relatively common in databases that use encryption. One of the most common uses is for things like credit card numbers. For the reasons mentioned here I would suggest that PostgreSQL is much more trustworthy in these cases.
Published at DZone with permission of Chris Travers, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Chaining API Requests With API Gateway
-
Managing Data Residency, the Demo
-
Why You Should Consider Using React Router V6: An Overview of Changes
-
Introduction To Git
Comments