The Darker Side of ALTER TABLE
In this blog, we will go over how developers and DBAs should use ALTER TABLE in a proper fashion to avoid problems this query can create.
Join the DZone community and get the full member experience.Join For Free
If you frequently find yourself immersed into the MySQL world, there's probably no need to explain to you what SQL queries are and what they do. Chances are, you know them like your five fingers.
SELECT queries allow us to read data,
INSERT queries allow us to insert data into a given database,
UPDATE queries allow us to update data for certain rows or for the entire table if we wish,
DELETE queries let us delete records from a table, etc. If you've ever dug deeper into the world of MySQL though, you must know that there is one more query that is very important for both database administrators and developers — that query is
What Is ALTER TABLE?
ALTER TABLE is a query in MySQL that allows us to modify (or
ALTER) the structure of a table. The things that
ALTER TABLE can do are numerous —
ALTER TABLE can be used to change names of tables or columns, or add or modify columns as well. On its end though,
ALTER TABLE is mostly used when indexes are involved — when developers need to add indexes, they change the structure of their tables, and that's where
ALTER TABLE comes into play once again.
The Speed of ALTER TABLE
The speed of
ALTER TABLE statements is, obviously, directly dependent on the size of the tables or columns we want to modify — however, there's one caveat.
ALTER TABLE performs all of its operations on a copy of the table, and not on the table itself. In other words, when MySQL is ordered to modify a table and an
ALTER TABLE statement is run, MySQL makes a copy of the table that is being modified, inserts the data that we currently have inside of our table into it, performs all of the required operations there, and copies the data back into our table — only then we see the results.
ALTER TABLE might work a little differently if we are not using InnoDB as our primary storage engine of choice — MyISAM will not show the rows that are being inserted into table until all operations have been completed, but nonetheless, the core premise remains the same. The speed of such a statement is directly dependent on the factors outlined below.
- The structure of our database tables.
- The size of our tables.
- Our web and database server.
- Our MySQL configuration.
- The way we run queries.
First off, the structure of our tables is important because queries involving columns that were not "cared for" (indexed, etc.) properly will be slower than they should be. Also, keep in mind that if our tables were not using normalization, our queries will be slower than they should be, especially if our tables are big. Imagine copying possibly millions of unnecessary rows into another table...: sounds slow, doesn't it?
Secondly, the size of our tables does indeed matter because the bigger our tables are, the longer time they will take to copy over. The second point is also heavily related to all of the other points because we also have to think about how our web server is configured and about the way we run our queries for them to complete without issues and as be as quick as possible: for example, if our database server is configured to use 80% of available operating memory, chances are that our queries would probably complete a whole lot faster than with the default configuration. Also, if we run multiple
ALTER TABLE queries one after another and are low on disk space, we would risk running out of it altogether. Allow us to illustrate.
The Dark Side of ALTER TABLE
Let's say we have a dedicated server with 16GB RAM and 100GB of hard drive space. A good DBA would probably configure it to occupy 8GB or more of operating memory, and suppose that the table that we want to run
ALTER TABLE queries on consists of 100 million records and isn't normalized. All in all, everything that this table consists of occupies 20GB of space on the disk.
At this point, we might probably have a couple of other tables we work with — say, they occupy another 50GB of hard drive space, but you get the point. This is the query we run: feel free to adapt, then copy and paste it into your client.
ALTER TABLE demo_table MODIFY demo_column VARCHAR(150) NOT NULL AFTER another_column;
First, MySQL would make a copy of the table on the disk, then insert the current data into it, perform all of the necessary operations, and only then swap the old and new tables together: remember that we have 30GB of disk space left? After the temporary table would be created, we would have 10GB left. By then, we would probably start praying that the index occupies less space than 10GB and that our queries complete successfully. If we run out of disk space, our queries get interrupted. Do we really want to see results like this? Of course not.
Avoiding Problems With ALTER TABLE
As you can see, once
ALTER TABLE gets deeper involved in our database processes, we may very well be faced with multiple problems. How do we solve them?
One way would be to use tools that make a copy of the tables and modify it using triggers to keep it accessible while the new table is being prepared: in other words, to change the structure of tables without locking them up: for a full guide, head over here.
Another way we can accomplish such a task would be manual, meaning that we can also use an approach like so:
- We need to ensure that
INSERTstatements would be quick — the table that we create should not have any indexes, and our MySQL instances should be configured properly.
- We need to create a new table.
- Once our table is created, we need to copy the data over from our old table into our newly created table using
INSERT INTO demo2 SELECT * FROM demowhere
demo2is the new table and
demois the name of the old table. Before doing that though, ensure that the structure of both tables are the same (see image below point #5.)
- We remove (drop) the old table:
DROP TABLE demo.
- We rename the new table to the name of the old table:
ALTER TABLE demo2 RENAME TO demo.
If we prefer to complete steps manually, such an approach will do. However, do keep in mind that such an approach is pretty much what
ALTER TABLE does in the first place, and that in some corner cases,
DROP TABLE might become awfully slow to execute — in that case, kill the query by using
SHOW PROCESSLIST and then
KILL a specific query ID. Then, execute a
TRUNCATE TABLE statement — truncating tables is usually way faster than removing them altogether.
That's it — problem solved!
Published at DZone with permission of Lukas Vileikis. See the original article here.
Opinions expressed by DZone contributors are their own.