AWS/Terraform Workshop (Part 2): EC2 Networking, Autoscaling Groups, and CloudWatch
AWS/Terraform Workshop (Part 2): EC2 Networking, Autoscaling Groups, and CloudWatch
In this second workshop, get a good look and some advice on AWS' regional and availability zones, CloudWatch's metrics, and Auto Scaling.
Join the DZone community and get the full member experience.Join For Free
Insight into the right steps to take for migrating workloads to public cloud and successfully reducing cost as a result. Read the Guide.
If you're just joining us for this series, we're seeing how AWS and Terraform can work together to enhance your environment and infrastructure. Before we dive in, you should make sure you take a look at the following:
- Go through Workshop #1
- Amazon AWS EC2 Networking:
- AWS Auto Scaling Groups
- AWS CloudWatch
AWS EC2 Networking Concepts
Amazon EC2 is hosted in multiple locations worldwide, composed of Regions and Availability Zones (AZs). Each region is a separate geographic area that has multiple, isolated locations known as Availability Zones. Amazon EC2 provides you with the ability to place resources, such as EC2 instances and data in multiple locations. Resources aren’t replicated across regions unless you do so specifically.
By placing your resources in multiple AZs you increase the probability that failures which may occur on Amazon's side won’t cause a complete outage of your service, just a temporary drop in its capacity and most probably a slowdown in its performance. Amazon data centers provide high-speed connections between AZs. Still, you should keep in mind that, usually, inter-AZ communications adds about 1 ms latency in comparison to communications within the same AZ.
Amazon Virtual Private Cloud (VPC) enables you to launch AWS resources in a virtual network that, by default, is isolated from other networks. VPC in AWS can be configured with IP address range, routing tables, subnets, network gateways and security settings like ACLs. In Smartling’s SOA tech stack, the VPC comes preconfigured, so developers shouldn’t worry about networking configurations like IP addressing scheme, routing, etc. At the same time, the possibility to make changes in the AWS networking configuration is open to service owners.
A subnet is a range of IP addresses in your VPC. You can launch AWS resources into a subnet that you select. Each subnet resides within one AZ and cannot span zones. There are public and private subnets. Public subnets make it possible to make AWS resources like EC2 instances reachable from the Internet. Private subnets are used for those resources which should not be exposed to the Internet. Most of our SOA components reside in private subnets for security reasons.
An EC2 security group acts as a virtual firewall that controls the traffic for one or more EC2 instances. When you launch an instance, you associate one or more security groups with the instance. You can add rules to each security group that allow traffic to or from its associated instances. You can modify the rules for a security group at any time — the new rules are automatically applied to all instances that are associated with the security group.
- Regions and Availability Zones
- VPC Introduction
- VPCs and Subnets
- Amazon EC2 Security Groups for Linux Instances
Amazon CloudWatch monitors your AWS resources and the applications. You can use CloudWatch to collect and track metrics, which are the variables you want to measure for your resources and applications like CPU load, network traffic IO, etc. CloudWatch alarms send notifications or automatically make changes to the resources you are monitoring based on rules that you define, for example, trigger the Autoscaling Group to scale.
A metric is the fundamental concept in CloudWatch. It represents a time-ordered set of data points that are published to CloudWatch. These data points can be either your custom metrics or metrics from services in AWS. You can retrieve statistics about those data points as an ordered set of time-series data. Think of a metric as a variable to monitor, and the data points represent the values of that variable over time. For example, the CPU usage of a particular Amazon EC2 instance is one metric, and the latency of an Elastic Load Balancing load balancer is another.
CloudWatch namespaces are containers for metrics. Metrics in different namespaces are isolated from each other so that metrics from different applications are not mistakenly aggregated into the same statistics — for example, namespaces AWS/EC2 for EC2 instances metrics and AWS/ELB for Elastic Load Balancers metrics.
A dimension is a name/value pair that helps you to uniquely identify a metric. Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. Dimensions help you design a structure for your statistics plan. Because dimensions are part of the unique identifier for a metric, whenever you add a unique name/value pair to one of your metrics, you are creating a new metric. Examples:
- AutoScalingGroupName (this dimension filters the data you request for all instances in a specified capacity group e.g. total CPU load within ASG).
- InstanceId (this dimension filters the data you request for the identified instance only).
A CloudWatch alarm watches a single metric over a time period you specify and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods. CloudWatch alarms will not invoke actions simply because they are in a particular state — the state must have changed and been maintained for a specified number of periods. After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action that you have associated with the alarm. For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.
An alarm has three possible states:
- OK: The metric is within the defined threshold.
- ALARM: The metric is outside of the defined threshold.
- INSUFFICIENT_DATA: The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.
AWS Auto Scaling Groups
Auto Scaling helps you ensure that you have the correct number of EC2 instances available to handle the load for your application. You can create collections of EC2 instances called Auto Scaling Groups (ASG). In each ASG, you should specify the minimum number of instances to ensure that your ASG never goes below this size. The same goes for the maximum number of instances. If you specify a desired capacity, either when you create the group or at any time thereafter, ASG ensures that your group has this many instances.
ASG launch configuration is a template that an ASG uses to launch EC2 instances. When you create a launch configuration, you specify information for the instances such as the ID of the Amazon Machine Image (AMI), the instance type, security groups etc. Changes to the launch configuration don't trigger the ASG to recreate existing instances with new templates, so you should rotate instances on your own.
Auto Scaling provides several ways for you to scale your ASG:
- Maintain current instance levels at all times (to maintain the current instance levels, Auto Scaling performs a periodic health check on running instances within ASG).
- Manual scaling. You only need to specify the change in the maximum, minimum, or desired capacity of your Auto Scaling group.
- Scale based on a schedule. Scaling by schedule means that scaling actions are performed automatically as a function of time and date.
- Scale based on demand. A more advanced way to scale your resources, scaling by policy, lets you define parameters that control the Auto Scaling process. For example, you can create a policy that calls for enlarging your fleet of EC2 instances whenever the average CPU utilization rate stays above ninety percent for fifteen minutes.
On-demand based scaling works in conjunction with AWS CloudWatch, which collects metrics and triggers the scaling process.
The Auto Scaling cooldown period is a configurable setting for your ASG that helps to ensure that it doesn’t launch or terminate additional instances before the previous scaling activity takes effect. After the ASG dynamically scales using a simple scaling policy, Auto Scaling waits for the cooldown period to complete before resuming scaling activities.
Auto Scaling enables you to put an instance that is in the InService state into the Standby state, update or troubleshoot the instance, and then return the instance to service. Instances that are on standby are still part of the Auto Scaling group, but they do not actively handle application traffic.
Auto Scaling enables you to suspend and then resume one or more of the Auto Scaling processes in your Auto Scaling group. This can be very useful when you want to investigate a configuration problem or other issue with your web application and then make changes to your application, without triggering the Auto Scaling process.
Go to VPC section, then the Your VPCs section and write down you VPC ID.
Go to the Subnets section, choose a private subnet and write down its ID as well as the Availability Zone that it's bound to.
Note: At Smartling, we're marking subnets with AWS tags so that it's easy to identify private and public subnets without digging into the routing tables they are associated with.
Specify the collected data in your Terraform configuration:
Go to the w2 directory in the cloned Smartling/aws-terraform-workshops git repository.
Edit file terraform.tfvars: specify VPC ID, Subnet ID and AZ, for example:
$ cat terraform.tfvars vpc_id = "vpc-1234567" subnet_id = "subnet-1234567" availability_zone_id = "us-east-1c"
Follow Terraform's documentation for ASG and the comments in the autoscaling.tf file to complete ASG and launch configuration.
Add missing names for Terraform resources.
Configure the launch configuration to create one t2.micro instance in the security group that is created in the ec2.tf file.
Set min_size = 1 and max_size = 3 in ASG, then set the cooldown to 60 seconds.
Make sure user-data for EC2 instances in the ASG contains your public SSH key.
Apply the Terraform configuration:
$ terraform plan $ terraform apply
Note #1: Always run your Terraform plan before applying it. Examine what Terraform is going to change/create/delete in AWS.
Note #2: You need to configure Terraform with your AWS credentials here. There're multiple ways to do it, and you can find one in our first workshop.
Check the newly created ASG in the AWS EC2 Management Console. You should see your ASG, launch configuration, and EC2 instance created by the ASG.
Uncomment the code in the Terraform configuration files to create a CloudWatch (CW) alarm to trigger an ASG scaling up policy if total CPU load in the ASG is more than 40%.
Enable EC2-detailed monitoring for EC2 instances in the ASG so that CloudWatch will collect metrics every 1 minute (Hint: See the docs for the launch config Terraform resource).
Configure the CW alarm to add one instance if CPU load in the ASG is more than 40%, cooldown = 60.
Use the CW alarm ARN to reference it in the template.
Apply the Terraform configuration.
Check the CW alarm in AWS web console.
Enable scaling protection for EC2 instance.
Find your ASG in the AWS EC2 Management Console and go to the "Instances" tab.
Select an instance, click on "Actions->Instance protection->Set scale in protection".
Generate CPU load to trigger the CloudWatch alarm and ASG scaling-up process.
Log into your EC2 instance via SSH and run the following commands:
$ dd if=/dev/urandom bs=1M count=200096 | gzip -9 |gzip -9 | gzip -9 >/dev/null
Review ASG events in the AWS web console. You should see +2 instances in 2 minutes. You can see it in the Activity History tab for ASG.
Add scale down policy to remove 1x Ec2 instance in case CPU load in ASG is less than 35%.
Create a new CW alarm that will trigger the scale-down ASG policy.
Apply the Terraform configuration.
Watch the scaling activity for your ASG, and when your'e done, let's destroy the resources.
Disable protection for the instance in the AWS web console.
Run the destroy command:
$ terraform destroy
Note: It will take slightly more time to terminate all resources in AWS than in the previous workshop.
Find our third workshop here.
Published at DZone with permission of Artem Nosulchik . See the original article here.
Opinions expressed by DZone contributors are their own.