True DevOps: Including Databases in Automated Deployment
True DevOps: Including Databases in Automated Deployment
In this article, I’ll explain the different methods of getting the database under source control, either by storing per-object state, migration scripts, or usually both.
Join the DZone community and get the full member experience.Join For Free
Read the 2019 State of Database DevOps Report for the very latest insights
The biggest driver for including the database in DevOps is to increase the speed of delivery of database changes. The biggest challenges are the perceived difficulties in overcoming different development approaches, so that the teams can synchronize application and database changes and the disruption that is anticipated as a result.
In this article, I’ll explain the different methods of getting the database under source control, either by storing per-object state, migration scripts, or usually both. I’ll then describe how SQL Change Automation is the key that opens the door to including the database in continuous integration and automated release management.
In a typical application automated deployment pipeline, for example, deployments are linked to source controlled versions of the application code so the changes made in development can be associated to what has been deployed.
Database Deployment Challenges
Any deployment, whether for the application or database, needs to be linked to a source controlled version so we can associate the changes made in development to what has been deployed. So, we’d like to put the database into source control and treat database changes in a similar way to application changes, but there are some additional challenges.
Persisting Data During Upgrades
Application code is easy to change, but with a database, the data must persist. You can’t simply drop and replace the database like you would an application.
When we update the database schema, for example, altering a column’s data type, we must do so in a way that preserves the business data. This may require additional "migration scripts," which must also be in source control, to enable existing data to fit in any changed schema. An example might be trying to deploy a change that added a
NULL constraint to a column with existing data. We’d need to write a migration script that loaded a default value to any
NULL columns in the table, store that script in source control, and incorporate it in the automated deployment process.
Loading Static Data and Test Data
While your database is designed to store customer and transaction data, it will not function unless certain tables are “pre-stocked” with any immutable data required for dependent applications to function. It might be something as small and simple as a list of countries within which your organization can trade. This sort of lookup data, as well as seed data, enumerations, and more, all the data that makes your system work, and it all needs to be stored in source control and deployed alongside schema changes.
You also need to think about how to provision realistic data for development and testing. How do you test the latest build in development with production or production-like data, for example, while ensuring that private and sensitive data is protected, in all environments?
A Working Database System Is More Than Schema-Plus-Static Data
A SQL Server database of any size will consist of many components on top of the SQL DDL code. As well as database-scoped objects and configuration settings, there are instance and server-level objects and settings to consider.
Special care needs to be given to the security configuration of any database. Certain objects (database users and roles) are database-scoped, but other users, roles, and permissions are server-scoped. Ideally, you’ll separate out entirely your handling of users, roles, and permissions, from the handling of code and schema changes.
To deploy a complete and standardized database environment, you’ll need a configuration file in source control that provides a single reference file for all login and user details and the roles to which these users belong and then another to define the permissions assigned to these roles.
In addition to security objects and static data, there are also likely to be agent jobs, external files, SSIS components, .NET libraries, PowerShell scripts, R and python scripts and modules, and a host of other components.
Our "deployment package" in source control needs to contain everything that is essential for the working database and to create the basic database and environment reproducibly on every target SQL Server instance in both development and pre-production environments.
How long does it take to make a one-line database change, run it through the normal testing process and get it to production? If your answer is measured in days, weeks, or months, then you have a few problems. Firstly, delivery time for any significant new functionality will be slow. Secondly, when you hit a production issue, you won’t always have time to go back to the source code, make the fix, test it, and then redeploy. Instead, the DBA will often perform a hotfix directly on the production database.
This drift causes inconsistencies that undermine tests and can cause failed deployments, either because code clashes, or because important fixes are accidentally rolled back.
Database Source Control Methodologies
If you just want to tear down the current database version and build a new version, leaving aside for a second all the “extra bits,” all you need in source control is the SQL DDL scripts describing the current "state" of every database object (plus any required static data), and an automated means to create each database object in the correct dependency order.
However, what if you need to update an existing database? For database code objects, such as stored procedures and functions, your strategy is the same. For table changes, however, you may also need to store a set of change, or migration, scripts describing how to alter the metadata of the database, as defined by its constituent DDL creation scripts, from one database version to another, whilst preserving the data held within it.
So, what database artifacts should be in source control? The database DDL code, or the migration scripts? The answer is “both,” but different tools take different approaches.
A tool like SQL Change Automation (SCA) works directly in Visual Studio, and it auto-generates numerically ordered migration scripts, using the industry-standard SQL Compare engine, under the covers. Each of the changes, often
ALTER commands, are then organized in SQL scripts and run in order at employment time, to migrate a database from one version to the next, while preserving any existing business data.
Alternatively, a tool like SQL Source Control, which is a plug-in for SQL Server Management Studio (SSMS), stores in source control the current state of each object in the database, as a
CREATE script. SQL Source Control compares the set of CREATE scripts to the target database and auto-generates a script to synchronize the two states, again using the SQL Compare engine.
The two approaches are not mutually exclusive. Sometimes, you’ll need the per-object history and rollback advantages of the state-based approach. Other times, you’ll want the closer control over the deployment code or the dependency ordering that the migrations-based approach offers. If you use the state-based approach, you’ll occasionally need to handcraft correct change scripts to define the correct route for migrations that affect existing data. Tools that use the migrations approach often “backfill” the state of each object, at each version, so that we can see easily how the object has changed over time.
Also, it makes little sense to retain a string of changes scripts of a code-based database. All you need to store in source control is a single script that creates the object, it doesn’t exist, or alters it otherwise so that existing permissions on that object are retained.
Automated Deployments From Source Control With SQL Change Automation
SQL Change Automation (SCA) is a tool designed to help you automate the way you build, test, and deploy SQL Server databases, from source control. It has the following components:
- An SCA Visual Studio plug-in (previously called ReadyRoll) — to auto-generate database migration scripts for a source-controlled database
- PowerShell Build components — to validate the database project and incremental scripts, create and test the build, and produce a build package you can publish
- PowerShell Release components — for automated deployment of the build package.
- Add-ins — to extend some of the more common delivery tools such as TeamCity, Visual Studio Team Services (VSTS), MS Build, Bamboo, Jenkins, Octopus Deploy, and TFS.
As discussed in the previous section, you can use SCA directly in Visual Studio, and it will auto-generate the correct deployment script from all the required change scripts. It also supports a state-based approach for source control of stored procedures, triggers, views, and functions. Alternatively, SCA will generate the deployment scripts from the latest versions of the
CREATE scripts for each object, as defined by a SQL Source Control project. In fact, it will work with whatever is in source control, if the directory structure is compatible with SQL Compare.
Regardless of your starting point, once your database and application are in source control, the deployment pipeline is the same and can use the same tools and techniques. SCA is designed, specifically, to adapt to your chosen development techniques, and to integrate smoothly with your existing development toolchain, including source control, build, integration, deployment tools. SCA is a PowerShell-based tool, making it easy to extend and adapt the processes that run as part of a database build or migration. Simply include the PowerShell scripts in deployment package (a NuGet package), and run them pre- or post-deployment.
Our goal was that by using SCA, our original application deployment pipeline, shown in Figure 1, needs a subtle tweak to incorporate the database.
It looks tempting, doesn’t it? The changes to both the application and database are committed to source control, which can then trigger a continuous integration process to test the changes before they go through to release management where the DBA can check the database deployment scripts before they leave development.
SCA will help address many of the database deployment challenges we discussed earlier in the article. I’ll refer you to Willian Brewer’s What is SQL Change Automation article for full details, but, for example, SCA will:
- Wrap into a single Nuget package all the scripts you need to build or update a database schema, and insert any static data. A tool such as Octopus Deploy or PowerShell Remoting can then deploy the package across a network.
- Build or update the database automatically from the build script, migration script or object source
- Validate the build — it will run pre-build syntax checks and can perform static code analysis. It will also integrate with a tool such as SQL Doc to document the database.
- Perform database testing — it can run tSQLt unit tests directly, and other tests using PowerShell scripting, and through integration with your CI Server.
- Automate deployments — you can run simple deployments directly within SCA, using PowerShell, and integrate with a release management tool such as Octopus deploy
Finally, SCA is useful not just for moving database changes towards production, but also for provisioning individual developers with their own copies of databases to use in sandboxes, and for rapidly setting up test servers. SCA provides the validate and tested build, plus any "static" data. We can then load test data, either from the NuGet package, or by integrating with a tool like SQL Data Generator, and deploy to each required server. If you need large volumes of test data, and to provision many test and development databases, then you can consider integrating with SQL Clone to speed up provisioning.
I gave a presentation on this topic at the Computing DevOps Summit in London, in 2017. In the Q&A session afterwards, I was asked what I thought was the biggest hurdle to implementing Database DevOps. The answer is simple: getting people working together in the right ways.
The right people must talk to each other, work with each other, and collaborate with each other. As Donovan Brown, Senior DevOps Program Manager at Microsoft, puts it:
“DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.”
Note that he puts people first.
Published at DZone with permission of Tom Austin , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.