One of the big “Selling Points” of MySQL is support for Multiple
Storage engines, and from the glance view it is indeed great to provide
users with same top level SQL interface allowing them to store their
data many different way. As nice as it sounds the in theory this
benefit comes at very significant cost in performance, operational and
development complexity. Let me touch just on the few complications it
Conversions – Each storage engine has its own data storage format so MySQL has to do a lot of copying and conversions while accessing the data, which significantly limits performance compared to basically “zero copy” design one may have streaming data from memory when it fits there.
Optimizer and Execution Storage engines are not created equal especially if you look at In Memory storage engines vs Distributed ones. Trying to handle all different cases in Optimizer – such as in memory nature of MEMORY storage engine, clustering by Primary key in Innodb and distributed nature of NDB makes it more complicated when it should be and still because it has to serve so many use cases it is not tuned to exploit full performance potential of any given storage engine.
Synchronization The top level on MySQL side, such as .frm files and binary log files has to be synchronized with storage engine transactions which require a lot of complications and performance overhead. MySQL has to do several fsync() calls per transaction commit to be on the safe side and still number of operations are not fully safe – you can for example get .frm out of sync with internal data dictionary if you crash at wrong time.
Transaction Support We have to deal with mix of locks on high level as well as on storage engine internally. We have to deal with different locking strategies storage engines may have and if we venture into using cross storage engine transactions we’re up for a lot of surprises.
Backup Cross storage engine backup is a pain because they are so different. Some of them are distributed and some in memory so even OS level approaches such as LVM backup may not work in all cases. We also get into synchronization aspects making backup which prevents from having hot backup in many cases.
Replication Support for different storage engines is a pain in replication. To start we have to write second binary log file when all transaction engines have their own !
Quality Assurance Multiple storage engines are much more complicated to test. And I believe multi-storage engine use is really not tested very deeply. I’ve seen some tests being run for different storage engines, but really this is not enough as MySQL allows you using multiple storage engines in the query you need to test all kinds of combination such as running multi-table update joining PBXT and Innodb tables.
Operations Things like backup and monitoring becomes a lot more complicated plus you have to deal with balancing – allocating resources to different storage engines and performance tuning becomes more complicated.
Now what is interesting for probably 95% of applications single storage engine would be good enough. In fact people already do not love to mix multiple storage engines very actively because of potential complications involved.
Now lets think what we could have if we have a version of MySQL Server which drops everything but Innodb Storage engine (it could be NDB, PBXT etc, it does not matter).
We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.
This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations. One could also lock them in memory to ensure predictable in memory performance.