Data Migration With AWS DMS and Terraform IaC
Learn how to migrate data from any supported source database to target databases using AWS Database Migration Service (AWS DMS) and Terraform IAC.
Join the DZone community and get the full member experience.
Join For FreeData is the new oil—a saying I often hear, and it couldn't be more accurate in today's highly interconnected world. Data migration is crucial for organizations worldwide, from startups aiming to scale rapidly to enterprises seeking to modernize IT infrastructure.
However, as a tech enthusiast, I've often found myself navigating the complexities of large volumes of data across different environments. A data migration that is not well planned or executed, whether it is a one-time event or ongoing replication, is done manually, not automated using any scripts, or not tested well, which can potentially cause issues during the migration and increase the delay or downtime.
To take this challenge head-on, I've interacted with several technology heads to ease data migration journeys and understand how AWS DMS streamlines data migration journeys. AWS DMS sets up a platform to execute migrations effectively with minimal downtime. I've also realized that we can completely automate this process using Terraform IAC to trigger migration for any supported source database to the target database. Using Terraform, we can create an infrastructure required for target nodes and AWS DMS resources, which can complete the data migration automatically.
In this blog, we'll dive deep into the intricacies of data migration using AWS DMS and Terraform IAC.
In this blog, we'll learn:
- What is AWS Data Migration Service (AWS DMS)?
- How to Automate Data Migration using AWS DMS and Terraform IAC
- Key Benefits and Features of AWS DMS?
Let's get started!
1. What Is AWS DMS (Database Migration Service)?
AWS DMS (Database Migration Service) is a cloud-based tool that facilitates database migration to the AWS Cloud by replicating data from any supported source to any supported target. It also supports continuous data capture (CDC) functionality, which replicates data from source to target on an ongoing basis.
AWS DMS Architectural Overview
Use Cases of AWS DMS
AWS Database Migration Service (AWS DMS) supports many use cases, from like-to-like migrations to complex cross-platform transitions.
Homogeneous Data Migration
Homogeneous database migration migrates data between identical or similar databases. This one-step process is straightforward due to the consistent schema structure and data types between the source and target databases.
Homogeneous Database Migration
Heterogeneous Database Migration
Heterogeneous database migration involves transferring data between different databases, such as Oracle to Amazon Aurora, Oracle to PostgreSQL, or SQL Server to MySQL. This process requires converting the source schema and code to match the target database.
Using the AWS Schema Conversion Tool, this migration becomes a two-step procedure: schema transformation and data migration. Source schema and code conversion involve transforming tables, views, stored procedures, functions, data types, synonyms, etc. Any objects that the AWS Schema Conversion Tool can't automatically convert are clearly marked for manual conversion to complete the migration.
DMS Schema Conversion
Heterogeneous Database Migrations
Prerequisites for AWS DMS
The following are prerequisites for AWS DMS data migration
- Access to source and target endpoints through firewall and security groups
- Source endpoint connection
- Target endpoint connection
- Replication instance
- Target schema or database
- CloudWatch event to trigger the Lambda function
- Lambda function to start the replication task
- Resource limit increase
AWS DMS Components
Before migrating to AWS DMS, let's understand AWS DMS components.
Replication Instance
Replication instances are managed by Amazon EC2 instances that handle replication jobs. They connect to the source data store, read and format the data for the target, and load it into the target data store.
Replication Instance
Source and Target Endpoints
AWS DMS uses endpoints to connect to source and target databases, allowing it to migrate data from a source endpoint to a target endpoint.
Supported Source Endpoints Include:
Supported source endpoints include Google Cloud for MySQL, Amazon RDS for PostgreSQL, Microsoft SQL Server, Oracle Database, Amazon DocumentDB, PostgreSQL, Microsoft Azure SQL Database, IBM DB2, Amazon Aurora with MySQL compatibility, MongoDB, Amazon RDS for Oracle, Amazon S3, Amazon RDS for MariaDB, Amazon RDS for Microsoft SQL Server, MySQL, Amazon RDS for MySQL, Amazon Aurora with PostgreSQL compatibility, MariaDB, and SAP Adaptive Server Enterprise (ASE).
Supported Target Endpoints Include
Supported target endpoints include PostgreSQL, SAP Adaptive Server Enterprise (ASE), Google Cloud for MySQL, IBM DB2, MySQL, Amazon RDS for Microsoft SQL Server, Oracle Database, Amazon RDS for MariaDB, Amazon Aurora with MySQL compatibility, MariaDB, Amazon S3, Amazon RDS for PostgreSQL, Microsoft SQL Server, Amazon DocumentDB, Microsoft Azure SQL Database, Amazon RDS for Oracle, MongoDB, Amazon Aurora with PostgreSQL compatibility, Amazon RDS for MySQL, and Amazon RDS for Microsoft SQL Server.
Replication Tasks
Replication tasks facilitate smooth data transfer from a source endpoint to a target endpoint. This involves specifying the necessary tables and schemas for migration and any special processing requirements such as logging, control table data, and error handling. Creating a replication task is a crucial step before starting the migration, which includes defining the migration type, source and target endpoints, and the replication instance.
A replication task includes three main migration types:
- Total Load: Migrates existing data only.
- Full Load with CDC (Change Data Capture): Migrates existing data and continuously replicates changes.
- CDC Only (Change Data Capture): Continuously replicates only the changes in data.
- Validation Only: Focuses solely on data validation.
These types lead to three main phases:
- Migration of Existing Data (Full Load): AWS DMS transfers Data from the source tables to the target tables.
- Cached Changes Application: While the total load is in progress, changes to the loading tables are cached on the replication server. Once the total load for a table is complete, AWS DMS applies the cached changes.
- Ongoing Replication (Change Data Capture): Initially, a transaction backlog delays the source and target databases. Over time, this backlog is processed, achieving a steady migration flow.
This detailed explanation ensures that AWS DMS methodically guides the data migration process, maintaining data integrity and consistency.
CloudWatch Events
AWS CloudWatch EventBridge delivers notifications about AWS DMS events, such as replication task initiation/deletion and replication instance creation/removal. EventBridge receives these events and directs notifications based on predefined rules.
Lambda Function
We use an AWS Lambda function to initiate replication tasks. When an event signaling task creation occurs in AWS DMS, the Lambda function is automatically triggered by the configured EventBridge rules.
Resource Limits
In managing AWS Database Migration Service (DMS), we adhere to default resource quotas, which serve as soft limits. With assistance from AWS support tickets, these limits can be increased as needed to ensure optimal performance.
Critical AWS DMS resource limits include:
- Endpoints per user account: 1000 (default)
- Endpoints per replication instance: 100 (default)
- Tasks per user account: 600 (default)
- Tasks per replication instance: 200 (default)
- Replication instances per user account: 60 (default)
For example, to migrate 100 databases from an On-Prem MySQL source to RDS MySQL, we use the following calculation:
- Tasks per database: 1
- Endpoints per database: 2
- Endpoints per replication instance: 100
Total tasks per replication instance = Endpoints per replication instance / Endpoints per database = 100 / 2 = 50.
This means we can migrate up to 50 databases per replication instance. Using two replication instances, we can migrate all 100 databases efficiently in one go. This approach exemplifies the strategic use of resource quotas for effective database migration.
How To Automate Data Migration With Terraform IaC: Overview
Terraform and DMS automate and secure data migration, simplifying the process while managing AWS infrastructure efficiently.
Here's a step-by-step overview of this seamless and secure migration process:
Step 1: Fetching Migration Database List
Retrieve a list of databases to be migrated.
Step 2: Database Creation (Homogeneous Migration)
Create target schema or database structures to prepare for data transition in case of homogeneous data migrations.
Step 3: Replication Subnet Group Creation
Create replication subnet groups to ensure seamless network communication for data movement.
Step 4: Source/Target Connection Endpoints
Equip each database set for migration with source and target connection.
Step 5: Replication Instance Creation
Create replication instances to handle the data migration process.
Step 6: Lambda Integration With Cloud Watch Events
Integrate a CloudWatch event and Lambda function to initiate replication tasks.
Step 7: Replication Task Creation and Assignment
Create and assign replication tasks to replication instances, setting up the migration.
Step 8: Migration Task Initiation
Migration tasks are initiated for each database.
Migration Process & Workflow Diagram
Architecture Overview for Data Migration Automation
AWS DMS with Terraform Infrastructure as Code (IAC) automates the data migration. The data migration automation process begins with the dynamic framework of Jenkins pipelines. This framework uses various input parameters to customize and tailor the migration process, offering flexibility and adaptability.
Here's a detailed overview of the architecture:
AWS DMS Architecture with Terraform IAC
Step 1: Jenkins Pipeline Parameters
The Jenkins pipeline for AWS DMS starts by defining essential input parameters, such as region and environment details, Terragrunt module specifics, and migration preferences.
Key input parameters include:
- AWS_REGION: Populates the region list from the repository.
- APP_ENVIRONMENT: Populates the application environment list from the repository.
- TG_MODULE: Populates the Terragrunt module folder list from the repository.
- TG_ACTION: Allows users to select Terragrunt actions from plan, validate, and apply).
- TG_EXTRA_FLAGS: Users can pass Terragrunt more flags.
- FETCH_DBLIST: Determines the migration DB list generation type (AUTOMATIC and MANUAL).
- CUSTOM_DBLIST: SQL Server custom Database list for migration if FETCH_DBLIST is selected as MANUAL.
- MIGRATION_TYPE: Allows users to choose the DMS migration type (full-load, full-load-and-cdc, cdc).
- START_TASKS: Allows users to turn migration task execution on or off.
- TEAMS: MS Teams channel for build notifications.
Step 2: Execution Stages
Based on the input parameters, the pipeline progresses through distinct execution stages:
- Source Code Checkout for IAC: The pipeline begins by checking out the source code for IAC, establishing a solid foundation for the following steps.
- Migration Database List: Depending on the selected migration type, the pipeline automatically fetches the migration database list from the source instance or uses a manual list.
- Schema or Database Creation: The target instance is created by creating the necessary schema or database structures for data migration.
- Terraform/Terragrunt Execution: The pipeline executes Terraform or Terragrunt modules to facilitate the AWS DMS migration process.
- Notifications: Updates are sent via email or MS Teams throughout the migration process.
Step 3: Automatic and Manual List Fetching
Fetched migration database list automatically from the source instance using a shell script and keeping FETCH_DBLIST automatic. Alternatively, users can manually provide a selective list for migration.
Step 4: Migration Types
The Terraform/Terragrunt module initiates CDC, full-load-and-cdc, and full-load migrations based on the specified migration type in MIGRATION_TYPE.
Step 5: Automation Control
Initiate the migration task, either manually or automatically, with START_TASKS.
Step 6: Credentials Management
For security, retrieve database credentials from AWS Secrets Manager while executing DMS Terraform/Terragrunt modules.
Step 7: Endpoint Creation
Establish endpoints for target and source instances, facilitating seamless connection and data transfer.
Step 8: Replication Instances
Create replication instances based on the database count or quota limits.
Step 9: CloudWatch Integration
Configure AWS CloudWatch events to trigger a Lambda function after AWS DMS replication tasks are created.
Step 10: Replication Task Configuration
Create replication tasks for individual databases and assign them to available replication instances for optimized data transfer.
Step 11: Task Automation
Replication tasks automatically start using the Lambda function in the Ready State.
Step 12: Monitoring Migration
Use the AWS DMS Console for real-time monitoring of data migration progress, gaining insights into the migration journey.
Step 13: Ongoing Changes
Seamlessly replicate ongoing changes into the target instance after the migration, ensuring data consistency.
Step 14: Automated Validation
Automatically validate migrated data against source and target instances based on provided validation configurations to reinforce data integrity.
Step 15: Completion and Configuration
Ensure user migration and database configurations are completed post-validation.
Step 16: Target Testing and Validation
Update the application configuration to use the target instance for testing to ensure functionality.
Step 17: Cutover Replication
Execute cutover replication from the source instance after thorough testing, taking a final snapshot of the source instance to conclude the process.
Key Features and Benefits of AWS DMS With Terraform
AWS DMS with Terraform IAC offers several benefits: cost-efficiency, ease of use, minimized downtime, and robust replication.
Cost Optimization
AWS DMS Migration offers a cost-effective model as it costs as per compute resources and additional log storage.
Ease of Use
The migration process is simplified with no need for specific drivers or application installations and often no changes to the source database. One-click resource creation streamlines the entire migration journey.
Continuous Replication and Minimal Downtime
AWS DMS ensures continuous source database replication, even while operational, enabling minimal downtime and seamless database switching.
Ongoing Replication
Maintaining synchronization between source and target databases with ongoing replication tasks ensures data consistency.
Diverse Source/Target Support
AWS DMS supports migrations from like-to-like (e.g., MySQL to MySQL) to heterogeneous migrations (e.g., Oracle to Amazon Aurora) across SQL, NoSQL, and text-based targets.
Database Consolidation
AWS DMS with Terraform can easily consolidate multiple source databases into a single target database, which applies to homogeneous and heterogeneous migrations.
Efficiency in Schema Conversion and Migration
AWS DMS minimizes manual effort in tasks such as migrating users, stored procedures, triggers, and schema conversion while validating the target database against application functionality.
Automated Provisioning With Terraform IAC
Leverage Terraform for automated creation and destruction of AWS DMS replication tasks, ideal for managing migrations involving multiple databases.
Automated Pipeline Integration
Integrate seamlessly with CI/CD pipelines for efficient migration management, monitoring, and progress tracking.
Conclusion
This blog talks in detail about how the combination of AWS DMS and Terraform IAC can be used to automate data migration. The blog serves as a guide, exploring the synergy between these technologies and equipping businesses with the tools for optimized digital transformation.
Opinions expressed by DZone contributors are their own.
Comments