Automating Lift-and-Shift Migration at Scale

Moving 100+ servers to the cloud manually is a recipe for disaster. Here is an architectural pattern for building an automated Migration Factory.

Dippu Kumar Singh

Feb. 04, 26 · Analysis

Likes (0)

Comment

Save

695 Views

For many enterprises, the “lift-and-shift” (rehost) strategy remains the most pragmatic first step into the cloud. It offers speed and immediate data center exit capabilities without the complexity of refactoring applications. However, doing this manually for hundreds of workloads introduces human error, security gaps, and “migration fatigue.”

To solve this, we need to treat migration not as a series of manual tasks, but as a manufacturing process. We need a Migration Factory.

This article outlines an architectural blueprint for automating large-scale migrations using AWS Application Migration Service (MGN) orchestrated by Step Functions and CI/CD pipelines.

The Core Problem: The Semi-Automated Trap

AWS MGN is a powerful tool, but out of the box, it is only “semi-automated.” You still need to:

Install agents manually on source servers
Monitor replication progress in the console
Manually launch test instances
Switch traffic for cutover

When you multiply these steps by 500 servers, you get inconsistent configurations, missed security tags, and blown timelines. The solution is to wrap AWS MGN in an orchestration layer that handles lifecycle state management.

The Architecture: Event-Driven Orchestration

The architecture below relies on decoupling the definition of the migration (the runbook) from the execution (the pipeline).

High-Level Workflow

1. The Migration Runbook (Source of Truth)

Instead of disparate spreadsheets, we define the migration wave in a structured JSON or CSV runbook stored in S3. This file acts as the infrastructure contract.

Sample Runbook Structure:

    JSON
   
 

   [
  {
    "hostname": "db-server-01",
    "wave_id": "wave-2",
    "target_instance_type": "r5.large",
    "subnet_id": "subnet-0abc123",
    "security_groups": ["sg-web", "sg-db"],
    "cutover_time": "2025-12-20T02:00:00Z",
    "tags": { "CostCenter": "Finance", "Environment": "Prod" }
  }
]
  

2. State Management with Step Functions

AWS Step Functions acts as the factory floor manager. It handles long-running processes that Lambda cannot, such as waiting for initial data replication (which can take days) or polling for “Ready for Testing” states.

Key State Transitions:

Agent Installation: Connects to the source via SSM or SSH and installs the MGN agent
Replication Loop: Polling loop checking dataReplicationInfo.state
Launch Configuration: Pushes runbook settings (instance type, security groups) to the MGN launch template via API
Test/Cutover Trigger: Executes launch logic based on the schedule

Automation Deep Dive: The Infrastructure Pipeline

Once replication is complete and the server is launched, we shift from “migration” tools to “DevOps” tools.

The Terraform Handoff

A common mistake is leaving the migrated server as a “ClickOps” artifact. To ensure the new environment is manageable, the Migration Factory triggers a CodePipeline job immediately after cutover.

This pipeline reads the final state of the migrated instance (AMI ID, private IP) and commits it to a Terraform state file.

Python/Boto3 Logic to Update Launch Templates:

    Python
   
 

   import boto3

mgn_client = boto3.client('mgn')

def update_launch_config(source_server_id, runbook_data):
    """
    Updates the AWS MGN Launch Configuration based on Runbook Metadata
    """
    response = mgn_client.update_launch_configuration(
        sourceServerID=source_server_id,
        targetInstanceTypeRightSizingMethod='NONE', 
        targetInstanceType=runbook_data['instance_type'],
        copyPrivateIp=True,
        copyTags=True,
        launchDisposition='STARTED'
    )
    return response['ResponseMetadata']['HTTPStatusCode'] == 200
  

Post-Launch Automation (The “Day 1” Scripts)

A server isn’t “migrated” just because it boots. It must be integrated into the cloud ecosystem. Using AWS Systems Manager (SSM), we automate the following “Day 1” tasks immediately post-cutover:

Agent Cleanup: Uninstall the MGN replication agent and legacy VMware tools
Observability: Install CloudWatch Agent and Fluent Bit
Security Hardening: Join the domain controller and apply Group Policies
License Switching: For SQL Server, automate the switch from BYOL (Bring Your Own License) to AWS License Included (LI), if required, to optimize costs

Security and Governance Patterns

In a factory model, security must be baked in, not bolted on.

Isolation: The factory operates in a dedicated “Migration VPC” with private subnets. Replication traffic flows over Site-to-Site VPN or Direct Connect — never the public internet.
Encryption: All data in transit is encrypted via TLS 1.2. Data at rest (EBS volumes) is encrypted using AWS KMS keys managed by the factory.
RBAC: The automation pipeline uses IAM roles with least-privilege access. Developers trigger migrations by uploading a file to S3, never by logging into the console.

Results: The Efficiency Gains

Implementing a factory model yields measurable improvements over manual migration:

Speed: Provisioning time reduced by 50% compared to manual lift-and-shift
Reliability: Migration success rates typically exceed 99% due to the elimination of manual configuration errors
Cost: “Wait time” is eliminated. Servers are spun down immediately after testing, and cutovers are executed precisely on schedule, minimizing parallel run costs

Conclusion

Building a Migration Factory requires upfront investment in code and architecture, but for fleets larger than 50 servers, the ROI is immediate. By orchestrating AWS MGN with Step Functions and Terraform, you transform a chaotic data center exit into a predictable, boring, and successful engineering event.

The goal is simple: one click to start, zero touches to finish.

AWS Lift (web framework)

Opinions expressed by DZone contributors are their own.

Related

Trending