Blue/Green Deployments on AWS Using Terraform
Uncover a complete guide on blue/green deployment on AWS utilizing Terraform. Learn step-by-step implementation, and strategies to smooth practices.
Join the DZone community and get the full member experience.
Join For FreeWith the rapid development of how applications are built and shipped, adopting the right deployment strategy is pivotal for ensuring strong Continuous Deployment (CD) and maintaining high software quality standards. Deployment strategies play a crucial role in DevOps practices, offering varied approaches to software release and infrastructure management.
In this blog, we will explore several key deployment strategies, emphasizing their relevance in Continuous Integration and Continuous Deployment pipelines, before focusing on the blue-green deployment method, particularly its implementation on AWS using Terraform, a leading Infrastructure as Code (IaC) tool.
- Rolling deployment: This technique, integral to continuous deployment, involves incrementally updating servers with the new version. It’s highly compatible with Agile methodologies, ensuring minimal downtime and facilitating a stable continuous delivery process.
- Canary deployment: A strategic fit for continuous deployment, canary deployment targets a small segment of the production environment first. Its gradual approach aligns well with Agile and DevOps principles, allowing for real-time monitoring and quick rollback if needed.
- A/B testing deployment: This strategy is crucial for user-centric continuous deployment, providing direct feedback on user engagement and experience. It’s a data-driven approach, often used in conjunction with continuous testing practices.
- Recreate deployment: Simple yet effective, this strategy involves downtime but is sometimes used in continuous deployment when zero downtime isn’t a critical factor. It’s straightforward and suitable for applications with flexible availability requirements.
- Shadow deployment: Often used in continuous deployment and continuous testing, this strategy involves duplicating real traffic to a shadow version. It’s excellent for performance testing under real conditions without impacting the end-user experience.
Focusing on blue-green deployment, this strategy is used for continuous deployment with zero downtime. It involves maintaining two identical environments: the Blue (current production) and Green (new version). At any given time, only one of these environments is live, serving all production traffic. When it’s time to release a new version of the software, the update is first deployed to the inactive environment (e.g., green). The switch from blue to green ensures minimal downtime and provides a quick rollback mechanism in case of issues, aligning seamlessly with continuous deployment and continuous integration (CI) practices.
Integrating Terraform, a prominent infrastructure-as-code tool, into blue-green deployment on AWS enhances the strategy. Terraform automates the creation and management of both environments, ensuring consistency and alignment with DevOps, continuous integration, and continuous deployment principles. This integration is particularly beneficial in AWS cloud environments, where managing complex infrastructures requires both precision and flexibility.
When To Use Blue-Green Deployment
There are several benefits to using blue-green deployment:
- Zero downtime: By routing traffic to the new environment before taking the old one out of service, you can ensure that there is no disruption to the end users.
- Easy rollback: If there are any issues with the new version of the software, you can quickly roll back by routing traffic back to the old environment.
- Improved reliability: By testing the new version of the software in a separate environment before releasing it to production, you can catch and fix any issues before they affect the end users.
- Confidence in release: Blue-green deployment allows you to release software updates with confidence, knowing that you have a fallback plan in case anything goes wrong.
Integrating Terraform With EC2 Autoscaling for Blue-Green Deployments
While blue-green deployments offer significant advantages, integrating this strategy with tools like Terraform and EC2 Autoscaling groups presents its own set of challenges. In this section, I’ll delve into these challenges and outline the effective solutions I’ve developed.
The Problem With Terraform and EC2 Autoscaling Groups
When implementing blue-green deployment using Terraform on AWS a key challenge emerges with EC2 Auto Scaling groups and how Terraform operates. This challenge is crucial for DevOps engineers and cloud architects who rely on Terraform for infrastructure as code (IaC) practices and AWS CodeDeploy for seamless deployment processes. Addressing this issue is essential for optimizing Continuous Integration/Continuous Deployment (CI/CD) pipelines and ensuring efficient cloud resource management.
The core of the problem lies in how Terraform interacts with AWS Auto Scaling groups during a blue-green deployment orchestrated by AWS CodeDeploy. AWS CodeDeploy, a critical service in AWS for automating software deployments, plays a vital role in this setup. According to the AWS CodeDeploy documentation, during a blue-green deployment, a new Auto Scaling group is created to transition to the new version of the application.
However, when Terraform is used to create and manage these Auto Scaling groups, it does not automatically recognize or incorporate the new Auto Scaling group created by CodeDeploy into its state management. This discrepancy leads to Terraform attempting to recreate the Auto Scaling group with its original configuration during subsequent terraform apply
operations. As a result, cloud engineers face errors and inconsistencies, which can disrupt the deployment process and lead to potential downtime or resource mismanagement.
To delve deeper into this topic, it’s essential to understand the intricacies of Terraform’s state management and how it interacts with AWS services. Terraform’s state file is crucial for tracking the current state of the infrastructure it manages. When external changes are made to the infrastructure that Terraform manages (in this case, by AWS CodeDeploy), Terraform’s state file does not automatically update to reflect these changes. This leads to a state mismatch, causing Terraform to try to enforce the configuration as defined in its code, which doesn’t account for the new Auto Scaling group.
Solution To Seamless Blue-Green Deployment for EC2 Autoscaling Groups With Terraform and AWS CodeDeploy
To navigate this challenge, we’ve developed an approach that ensures Terraform, AWS CodeDeploy, and EC2 Autoscaling groups work in harmony. This section provides a detailed step-by-step implementation of the solution.
1. Modify the official terraform module (here Terraform-aws-module) to accommodate the solution requirements
Add support for an additional variable to ignore resource tag-related changes.
# Add new variable to variables.tf file
variable "ignore_tags" {
description = "Determines whether the `tags` value is ignored after initial apply. See README note for more details"
type = bool
default = true
}
Whenever blue/green deployments are done by AWS CodeDeploy, a new autoscaling group is created on every deployment with a new name and additional tags like deployment ID. These details are not present in the Terraform state since the AWS Code triggered the change deploy service and not Terraform. To avoid the Terraform state deviation after each deployment, I created the AWS EC2 autoscaling group’s name with a unique tag ID.
Add lifecycle policy in resource "aws_autoscaling_group
to ignore_changes in tag
property here –Terraform-aws-autoscaling.
lifecycle {
create_before_destroy = true
ignore_changes = [tag]
}
}
2. Sample Terraform code to deploy an EC2 Autoscaling group
data "aws_autoscaling_groups" "app" {
filter {
name = "tag:id"
values = ["app-asg"]
}
}
module "asg-app" {
source = "../modules/asg/"
name = length(data.aws_autoscaling_groups.app.names) > 0 ? data.aws_autoscaling_groups.app.names[0] : "app-asg"
use_name_prefix = false
desired_capacity = 2
min_size = 2
max_size = 4
health_check_type = "EC2"
vpc_zone_identifier = ["pvt-subnet-1-id", "pvt-subnet-2-id"] ## replace with the VPC private subnet IDs
target_group_arns = ["alb_target_group_arn"] ## replace with ARN of the ALB Target Group
# Launch template
launch_template_name = "app-launch-template"
launch_template_description = "Launch Template for application"
image_id = "ami-id" ## replace with the AMI ID of youe application
instance_type = "t3.large"
ebs_optimized = true
enable_monitoring = false
security_groups = [" sg-xxxxxxx "] ## Add the security group IDs to attach to this ASG instances
key_name = "ssh-key-pair" ## Keypair used to launch instances in ASG
iam_instance_profile_name = "ec2-role-for-s3-ssm-secret-manager"
user_data = base64encode("#!/bin/bashecho \"Hello\"")
block_device_mappings = [
{
device_name = "/dev/sda1"
no_device = 0
ebs = {
delete_on_termination = true
encrypted = true
volume_size = 30
volume_type = "gp3"
}
}
]
scaling_policies = {
dynamic_TTS_policy = {
policy_type = "TargetTrackingScaling"
target_tracking_configuration = {
predefined_metric_specification = {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 70.0
}
}
}
tags = {
Name = "app-asg"
terraform = "true"
id = "app-asg"
}
}
To avoid the AWS Autoscaling group module from creating auto-scaling groups with random names, I have set use_name_prefix
to false
Then using the terraform data source feature, we fetched the name of the new auto-scaling group with the help of tags and referred to it while calling the module again for any changes.
This code snippet assumes that the VPC network and AWS Application Loadbalancers are already created.
3. I have used Terraform to create the AWS code deploy service resources also and its configurations
## Codedeploy main.tf
resource "aws_iam_role" "codedeploy_service_role" {
name = "codedeploy_service_role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Action = "sts:AssumeRole",
Effect = "Allow",
Principal = {
Service = "codedeploy.amazonaws.com"
},
},
],
})
}
resource "aws_iam_policy" "codedeploy_access_policy" {
name = "codedeploy_access_policy"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Action = [
"autoscaling:CompleteLifecycleAction",
"autoscaling:DeleteLifecycleHook",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeLifecycleHooks",
"autoscaling:PutLifecycleHook",
"autoscaling:RecordLifecycleActionHeartbeat",
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:DescribeInstances",
"ec2:DescribeTags",
"ec2:DetachInstances",
"ec2:AttachInstances",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DescribeInstanceHealth",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer"
],
Effect = "Allow",
Resource = "*"
},
],
})
}
resource "aws_iam_role_policy_attachment" "codedeploy_access_policy_attachment" {
role = aws_iam_role.codedeploy_service_role.name
policy_arn = aws_iam_policy.codedeploy_access_policy.arn
}
resource "aws_codedeploy_app" "my_app" {
compute_platform = "Server"
name = "my_app"
}
resource "aws_codedeploy_deployment_group" "blue" {
app_name = aws_codedeploy_app.my_app.name
deployment_group_name = "blue"
service_role_arn = aws_iam_role.codedeploy_service_role.arn
}
resource "aws_codedeploy_deployment_group" "green" {
app_name = aws_codedeploy_app.my_app.name
deployment_group_name = "green"
service_role_arn = aws_iam_role.codedeploy_service_role.arn
}
This terraform code snippet creates an IAM Role and Policy for CodeDeploy that grants AWS CodeDeploy the necessary permissions to perform deployments across EC2 instances and Autoscaling groups. This role will be assumed by the CodeDeploy service. It also creates the CodeDeploy Application and sets up deployment groups (one for each of the blue and green environments. )
4. Solution to the terraform state deviation problem: I also created a script that needs to be run before any terraform operations. This will import new auto-scaling groups created by AWS Codedeploy Service’s Blue-Green Deployment strategy and replace the older auto-scaling group details. Now the terraform plan
and terraform apply
will not create a new auto-scaling group after the CI/CD deployments.
terraform_before_apply.sh
terraform refresh
# setting variables for auto scaling groups and policies location in state file
asg_location=module.asg.aws_autoscaling_group.id[0]
# checking the status of asgs in terraform state if there are changes then the new asg will be imported in place of that
# importing the updates in asg
terraform state show $asg_location | grep $(terraform output -raw asg_name) > /dev/null 2>&1
if [ $? != 0 ]
then
terraform state rm $asg_location
terraform import $asg_location $(terraform output -raw asg_name)
terraform refresh
terraform state rm 'module.asg.aws_autoscaling_policy.this["dynamic_TTS_policy"]'
terraform import 'module.asg.aws_autoscaling_policy.this["dynamic_TTS_policy"]' $(terraform output -raw asg_name)/dynamic_TTS_policy
echo "updated asg"
fi
Let us go through all the commands in this script:
a.terraform refresh
command refreshes the state of the Terraform to identify any changes.
b. This command will match the name in outputs and state
terraform state show $asg_location | grep $(terraform output -raw asg_name) > /dev/null 2>&1
c. If the above command gives an exit status code value other than “0”, it means the autoscaling group’s name has changed as a part of CI/CD runs.
d. The next commands will then remove the existing autoscaling group from the terraform state and import the new one. Along with the auto-scaling group, we need to import a new auto-scaling policy as well that is associated with the new autoscaling group created by AWS Codedeploy.
Conclusion
In this blog, I navigated through the challenges of setting up blue-green deployments using AWS, Terraform, and AWS CodeDeploy. Blue-green deployment is more than just a deployment strategy; it’s a pathway to ensuring zero downtime, enhancing the reliability of your applications, and providing a safety net through easy rollbacks.
Published at DZone with permission of Nitin Yadav. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments