Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Schema Migration for Databases

DZone's Guide to

Schema Migration for Databases

You should use change management tools to maintain your database schema. There are great open-source SCM tools available to make your life easier, especially when you're using CI.

· Database Zone
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

This article is featured in the new DZone Guide to Databases. Get your free copy for more insightful articles, industry statistics, and more!

It is generally accepted that you should always use proper version control when developing software. Why wouldn’t you do the same with your database changes?

The question is, how do you manage changes to the database? The quick answer is, “write .sql scripts and check them into the version control system.” As usual, things aren’t always as simple as they first seem. Maintaining “build from scratch” scripts in version control is easy enough, but what do you do when you need to modify an existing database object?

The Hard Way

One method I’ve used in the past is to maintain two sets of scripts, each set in a different directory: one for new installations and one for upgrades.

New Installation Scripts

In this directory, I have scripts for creating database objects. I prefer to have a single script handle a single object such as a table, view, or stored procedure. I also include separate scripts that will populate the default data in the master tables.

Upgrade From Version X to Y

In this directory are the scripts that will alter the database objects, upgrading them from one version to the next.When needed, this will include any data migration scripts.

In both directories, there is a master script that runs the change scripts in the correct order. Each script should contain validation logic to ensure they only run when they are supposed to and they raise any errors if something goes wrong.

While this approach can work, it’s not perfect and there are some problems that are difficult to address.

  • How sure can we be that everything ran?

  • If there’s a problem somewhere in the process, how do we rollback the changes?

  • Can I run some scripts only for testing and not in production?

Each of these problems has the same answer: Write more scripts.

Use a Schema Change Management (SCM) Tool Instead

Typically, an SCM (sometimes called a migration tool) will function like a combination of “the hard way” directories above. Every change you make is defined in a change log. You’ll start with the very first “create table” changelog and add a new changelog for all changes made from then on, such as adding or dropping a column, index, or constraint. The SCM tool will keep track of what has or hasn’t been run and implement only the new changes. If you’re doing a brand-new install, it will start at the beginning and run them all.

There are a lot of great SCM tools available, and many of them are open source.

If you’re using a programming framework in your application, such as Python’s Django or Ruby on Rails, it may have built-in SCM tools. My first real experience with an SCM tool was a few years ago. I was writing a small Ruby on Rails application using Active Record, and since then, I have been a strong advocate for SCM tools.

I love exploring many different programming languages — which can make it difficult to get very deep in any language-specific SCM tool. Fortunately, there are also standalone SCM tools you can use such as Flyway and Liquibase. Both of these tools are open-source and well-supported.

Most SCM tools work with many different databases — most of which implement slightly different versions of SQL. The tools will use their own object definitions in a common format such as JSON, XML, YAML, and others. The SCM tool will translate these definitions into the correct SQL for the database you’re deploying your changes to. For my examples, I will be using an Oracle Database, but the same definitions should work with many different databases.

Examples

I’ve used a few of these tools, some more than others, but the one I’m most familiar with is Liquibase, so I’ll use it for the following examples. Usually, the Liquibase examples you’ll find online are written in XML, but I find XML to be a bit too pointy so I’ll be using JSON.

Installation and Configuration

In my database, I’ve created a schema called lb_demo to use with the examples. If you run these examples, make sure you use a schema that is safe to experiment in. Locate (or download) the JDBC driver for your database. I’ll be using the Oracle driver ojdb7.jar.

Liquibase refers to the database changes as “changesets” and the files that contain them as “changelog files.” A changelog master lists the changelogs in the order that they will be run in.

Create the following directories and files:

-LiquibaseExample
	  |- liquibase.properties
 	  |- changelog
			  |- db.changelog-master.json
			  |- db.changelog-1.0.json
			  |- db.changelog-2.0.json

Liquibase is “just” a Java application, so you don’t actually need to do anything more than download the ZIP file, extract it, and run it. (Make sure your Java is version 1.6 or higher.) You can run it directly from Java:

java $JAVA_OPTS -jar /opt/liquibase/liquibase.jar
--driver=oracle.jdbc.OracleDriver
--classpath=”/home/example/jdbcDrivers/ojdbc7.jar”
--url=jdbc:oracle:thin:lb_demo/password@dbaccess
--changeLogFile=changelog/db.changelog-master.json
update

The Liquibase download includes a shell script and batch file that can be used in place of running the .jar file. You can simplify your process by including the extracted Liquibase directory in your path. For these examples, I will be using the Shell script. To make it a little easier, I will include the parameters from above in a properties file.

Open liquibase.properties and add the following text substituting the values for your configuration:

#Liquibase.properties
driver: oracle.jdbc.OracleDriver
#your database driver
classpath: /home/example/jdbcDrivers/ojdbc7.jar
#jdbc driver location
url: jdbc:oracle:thin:lb_demo/password@dbaccess
#jdbc connect string
changeLogFile: changelog/db.changelog-master.json
#master change log location

Example 1: Create Two Tables

Open db.changelog-master.json and add the following JSON:

{
  “databaseChangeLog”: [{
	  “include”: {
		“file”: “changelog/db.changelog-1.0.json”
	  }
	},
	{
	  “include”: {
		“file”: “changelog/db.changelog-2.0.json”
	  }
	}
  ]
}

The first changeset creates a table called lb_groups with three columns: idname, and description. Open db.changelog-1.0.json and add the following JSON:

{ 
  "databaseChangeLog": [{ 
    "preConditions": [{ 
      "runningAs": { 
        "username": "lb_demo" 
      } 
    }] 
  }, { 
    "changeSet": { 
      "id": "Two-Table-1", 
      "author": "BlaineCarter", 
      "comment": "Add table lb_groups", 
      "changes": [{ 
        "createTable": { 
          "tableName": "lb_groups", 
          "columns": [{ 
              "column": { 
                "name": "id", 
                "type": "int", 
                "autoIncrement": true, 
                "constraints": { 
                  "primaryKey": true, 
                  "nullable": false 
                } 
              } 
            }, 
            { 
              "column": { 
                "name": "name", 
                "type": "varchar(50)", 
                "constraints": { 
                  "unique": true, 
                  "uniqueConstraintName": "uk_lb_groups_name" 
                } 
              } 
            }, 
            { 
              "column": { 
                "name": "description", 
                "type": "varchar(200)" 
              } 
            } 
          ] 
        } 
      }] 
    } 
  }] 
} 

The second change set creates a table called lb_people with four columns: id, firstname, lastname, and group_id. A foreign key to the lb_groups table is created using group_id.

Open db.changelog-2.0.json and add the following JSON:

{ 
  "databaseChangeLog": [{ 
    "preConditions": [{ 
      "runningAs": { 
        "username": "lb_demo" 
      } 
    }] 
  }, { 
    "changeSet": { 
      "id": "Two-Table-2", 
      "author": "BlaineCarter", 
      "comment": "Add table lb_people", 
      "changes": [{ 
        "createTable": { 
          "tableName": "lb_people", 
          "columns": [{ 
              "column": { 
                "name": "id", 
                "type": "int", 
                "autoIncrement": true, 
                "constraints": { 
                  "primaryKey": true, 
                  "nullable": false 
                }, 
              } 
            }, 
            { 
              "column": { 
                "name": "firstname", 
                "type": "varchar(50)" 
              } 
            }, 
            { 
              "column": { 
                "name": "lastname", 
                "type": "varchar(50)", 
                "constraints": { 
                  "nullable": false 
                }, 
              } 
            }, 
            { 
              "column": { 
                "name": "group_id", 
                "type": "int", 
                "constraints": { 
                  "foreignKeyName": "groupFK", 
                  "references": "lb_groups(id)" 
                }, 
              } 
            } 
          ] 
        } 
      }] 
    } 
  }] 
} 

Most of the properties here are fairly self-explanatory and similar to what you’d use in an SQL script to create a table. Let’s take a look at some of the extra properties.

Liquibase uses Id and author combined with the file name and package to uniquely identify a changeset. The Id and author values can be any string you’d like, but you should use values that will have meaning in your process.

The preConditions array is a list of conditions that Liquibase will check before running the changes. In this example, I’m making sure that I’m connected to the correct schema lb_demo before adding the tables. The linked documentation has a long list of other conditions you can use.

And finally, the comment value will be included in the Liquibase database changelog table (discussed below) and in the documentation generated by Liquibase.

Run It

Warning: When you run Liquibase, it executes the changes you defined. There is no “Are you sure?” prompt and there are not separate commits.

Run this command: liquibase update.

Connect to your database with your favorite SQL tool and inspect the changes. If this is the first time Liquibase has been run, you should have two new tables that Liquibase uses to control and log the changes: databasechangeloglock and databasechangelog. Take a look at the databasechangelog table to see what dataLiquibase tracks for the changes. You should also see the two new tables, lb_groups and lb_people, defined in the changesets above.

Example 2: Roll Back Changes

Liquibase includes the ability to automatically roll back certain changes. Let’s add a column and roll it back.

First, add a new file db.changelog-3.0.json to the changelog directory.

{
  “databaseChangeLog”: [{
	“preConditions”: [{
	  “runningAs”: {
		“username”: “lb_demo”
	  }
	}]
  }, {
	“changeSet”: {
	  “id”: “RollbackTag-3”,
	  “author”: “BlaineCarter”,
 	  “comment”: “Add rules column to lb_groups table”,
 	  “tagDatabase”: {
		“tag”: “ver-1”
				“columns”: [{
			“column”: {
			  “name”: “rules”,
			  “type”: “varchar(255)”
			}
		  }]
		}
	  }]
	}
  }]
}

Include the new file at the end of db.changelog-master.json.

	{
	  “include”: {
		“file”: “changelog/db.changelog-3.0.json”
	  }
	}

Run the update: liquibase update.

Connect to your database and verify that the column rules has been added to the lb_groups table. Look in the  databasechangelog table to see that the new change has been logged.

Liquibase has the option to roll back to a date, a tag, or a certain number of changes. When you roll back changes, they are rolled back in the reverse order that they were run. Run the following command to roll back the last change: liquibase rollbackCount 1.

Connect to your database and verify that the column rules has been removed from the lb_groups table. Look in the databasechangelog table to see that the record for changelog 3 has been removed. Liquibase only tracks changes thathave been applied to the database — when you roll back a change, it is no longer in the log.

Other Schema Change Management Features

  • Branching and merging.

  • Reverse-engineer your current database into changelogs.

  • Compare differences between schema.

  • Load data from a changeset or CSV file.

  • Auto-generate schema documentation.

  • Use labels and contexts to conditionally include orexclude specific changes.

Schema Change Management vs. "The Hard Way"

These examples were just a small taste of what you can do with a good SCM tool. Let’s compare them to the way I used to do it.

Both methods produce a similar number of files. The SCM method only requires maintaining the changes in one place instead of files in two different directories.

The JSON files are a bit more verbose than the SQL scripts but if you’re like me, I find JSON to be more readable.

The SCM method automatically keeps track of what has run and only runs the new changes.

The SCM method gives you the ability to automatically rollback most changes.

Perhaps the biggest difference is the ease of using an SCM tool in a continuous development pipeline versus trying to manage a large directory of SQL files. This alone should be worth exploring a schema change management option.

This article is featured in the new DZone Guide to Databases. Get your free copy for more insightful articles, industry statistics, and more!

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
database ,schema management ,migration ,continuous integration ,scm

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}