How to Handle Huge Database Tables
Learn how to design your huge database tables to ensure they can handle queries without slowing the database's performance to a crawl.
Join the DZone community and get the full member experience.Join For Free
Get a jump on query optimization in your databases by designing tables with speed in mind. This entails choosing the best data types for table fields, choosing the correct fields to index, and knowing when and how to split your tables. It also helps to be able to distinguish table partitioning from sharding.
It's a problem as old as databases themselves: large tables slow query performance. Out of this relatively straightforward problem has sprung an industry of indexing, tuning, and optimizing methodologies. The big question is, which approach is best for your database system?
For MySQL databases, in particular, query performance starts with the design of the table itself. Justin Ellingwood explains the basics of query optimization in MySQL and MariaDB in a Digital Ocean article from November 11, 2013, and updated on May 30, 2014.
For example, data elements that will be updated frequently should be in their own table to prevent the query cache from being dumped and rebuilt repeatedly. Generally speaking, the smaller the table, the faster the updates.
Similarly, by limiting data sizes up front you avoid wasted storage space, such as by using the "enum" type rather than "varchar" when a field that takes string values has a limited number of valid entries.
There's More Than One Way to "Split" a Table
Generally speaking, the bigger the database table, the longer it takes to access and modify. Unfortunately, database performance optimization isn't as simple as dividing big tables into several smaller ones. Michael Tocker describes 10 ways to improve the speed of large MySQL tables in an October 24, 2013, post on his Master MySQL blog.
One of the 10 methods is to use partitioning to reduce the size of indexes by creating several "tables" out of one. This minimizes index->lock contention. Tocker also recommends using InnoDB rather than MyISAM even though MyISAM can be faster at inserts to the end of a table. MyISAM's table locking restricts updates and deletes, and its use of a single lock to protect the key buffer when loading or removing data from disk causes contention.
Much confusion surrounds the concept of database table partitioning, particularly how partitioning is distinguished from sharding. When the question was posed on Quora, Mosaic CTO Tony Bako explained that partitioning divides logical data elements into multiple entities to improve performance, availability, and maintainability.
Conversely, sharding is a form of horizontal partitioning that creates replicas of the schema and then divides the data stored in each shard by the shard key. This requires that DBAs distribute load and space evenly across shards based on data-access patterns and space considerations.
Sharding uses horizontal partitioning to store data in physically separate databases; here a user table is sharded by values in the "s_age" field. Source: CUBRID.
Published at DZone with permission of Darren Perucci, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.