Live Database Migration
In this article, learn more about database migration — the process of transferring data from one database system to another.
Join the DZone community and get the full member experience.Join For Free
Database migration is the process of transferring data from one database system to another, which can involve changing the database schema, the database management system, or both. It is done to improve performance, scalability, and reliability or to take advantage of new features and capabilities in the target database system.
Database migration comes in two forms when it comes to the availability of the system — downtime and live.
The system is brought down for maintenance, and a splash page saying "Under maintenance" is put up by the developers. Behind the scenes, the data is copied over to the new database, and the connection strings are updated from the old to the new database. Once everything is validated to be working, the system is made live again. There are certain advantages to this method in that it is not necessary to care too much about system availability and/or data loss during the migration. The disadvantage, of course, is that the system SLA goes down, which may not be an option for all kinds of applications.
In this method, the system is never brought down. Instead, the migration design is made such that migration can be done without any downtime, i.e., the system stays live throughout the migration. Although this requires a more complex design, this approach guarantees no/minimal impact on service availability SLAs and might be the more attractive option for many scenarios.
This article will take a look at a case study of a Live database migration design.
The Live database migration design is broken into a few phases:
Phase 1: Data Hiding
This step is a code-heavy step in which the service is written in a form such that the service layers sitting above the data layer are agnostic to the database being used by the service. This step includes:
- Using database implementation-agnostic objects (and definitely not using model objects!) outside of the database layer.
- Using transformation functions to convert database model objects to and from the database implementation-agnostic objects
By the completion of Phase 1, we can safely start coding the migration modules without impacting current functionality.
Phase 2: New Data Layer
This step introduces the new data layer alongside the existing data layer. Most of the time, this will be a new repository class, with each table/collection having a new copy of the CRUD functions 1:1 mapped from the existing data layer.
This step also introduces the MigrationFlag (True/False) and ensures the new data layer code is only behind the True flag. This way, once the code is deployed, the config is deployed as False initially, and the code only writes to the old data layer. We shall see what happens when the configuration is turned to True in a later phase.
Phase 3: Data Sync
This step involves some data engineering and creating data pipelines that will copy the data from the old database to the new database. This includes:
- Having a
datetimemarker column saved somewhere, which tracks recent changes to any record, and helps track the latest version of records that need to be synced to the new database.
- Also, sometimes, the data is transformed from the old database to the new database in case there are any schema/model changes that are part of the migration.
It is important to note that by running this data sync regularly, a live copy of the old database is being built, which would ideally be able to take traffic when switched to it. This brings us to the next Phase.
Phase 4: Config Switch
The migration flag that was developed in Step 2 is now used here to switch from False to True. The code then starts to dual-write to the old and new database and read from the new database. This helps in 2 ways:
- If the data sync developed in Step 3 was correct, then dual-written data to the old database will get synced to the new database, but the data already exists in the new database with the configuration changed to True. Therefore, the operation won't make any changes to functionality. However...
- If something were to break, the dual-write system ensures that the MigrationFlag config flag can be set to False at any time to stop the dual-write and the read from the new database and effectively return to only reading and writing from the old database as before.
If there are no issues, then we move to the last steps
Phase 5: Cut Off Old Database
The code in this step is updated to stop the dual writing into the old database and only read and write from the new database. The data sync can also be safely stopped after the dual write is stopped and it is validated that there are no new data being synced via the pipeline.
At the end of Phase 5, there can be an optional cleanup phase to delete all the unused code and data sync sources.
In conclusion, live database migration is a complex but useful process that ensures the seamless transition of data from an old database to a new one without any service downtime. The phased approach outlined ensures that the process is safe, reversible, and foolproof against any issues that might arise during testing. By the end of the migration, the system is fully live on the new database, with all unused code and data sync sources cleaned up.
Opinions expressed by DZone contributors are their own.