Recently, I helped a client that was having issues with deployments causing locking in their production databases. At a high level, the two key components used in the environment were:
At a high level, they currently used a CLI call to Liquibase as a sub-task within a larger deployment task. The goal of this engagement was to modify that sub-task to run Liquibase in a non-blocking fashion as opposed to the default that just runs native ALTERS against the database.
As I wasn’t very familiar with Liquibase, I took this opportunity to learn more about it and it seems like a very valuable tool. Essentially, it does the following:
- Adds two version control tables to your schema
- Reads an XML “changelog” file that has the schema changes
- Verifies which changes have yet to be applied
- Applies the changes in serial to the tables
- Records each change in the version control table as it happens
While this is a very powerful tool, as you dataset grows, this can be problematic. As each change is run as an independent ALTER statement, consider a large (several million row) table that you are updating with multiple indexes – this can result in several lengthy blocking operations that can impact your application in other ways.
My first thought when hearing that table alters were locking up the production application was naturally to use Percona Toolkit’s pt-online-schema-change – but the challenge lied with the integration of the tools. After some investigation and discussion, I found two approaches that seemed feasible to get pt-osc integrated:
- Write a custom plugin for Liquibase
- Utilize the updateSQL action to fetch the raw SQL and parse/process each statement
Due to time constraints, we landed on option 2. In pseudocode, here is the approach that we took:
#Output is SQL string of all commands that Liquibase would run sql = liquibase updateSQL foreach sql as line switch (line) case blank/comment: next case INSERT/CREATE TABLE runNativeSql(line) case ALTER TABLE runPtOsc(line) case CREATE INDEX line = convertCreateToAlter(line) runPtOSC(line)
This turned out to be exactly what the client needed as it allowed the code deployment to still utilize the version control methods that were understood, but did so in a non-blocking manner. The main caveat with this approach is that the application code needs to be backwards and forwards compatible with the changes. As you now may see a long running background process, depending on when your code is actually deployed in the complete process, it may need to handle the database in different states.
Although this is a frequent scenario with deployments anyway, I just wanted make note that often times when changing schema, the application must be able to handle the database in different states.
Here is the link to the (sanitized) proof-of-concept code and some samples I used for testing: https://github.com/mbenshoof/liqui-cap-online. Please note that I am by no means a Ruby developer so I assume there are plenty of coding optimizations that can be made – this was more of an exercise in combining the tools as opposed to a Ruby showcase.