Autoscaling an Amazon Elastic Kubernetes Service cluster
This article takes a look at two methods of autoscaling in an Amazon EKS cluster, Horizontal Pod Autoscaler and Cluster Autoscaler.
Join the DZone community and get the full member experience.Join For Free
In this article we are going to consider the two most common methods for Autoscaling in EKS cluster:
- Horizontal Pod Autoscaler (HPA)
- Cluster Autoscaler (CA)
The Horizontal Pod Autoscaler or HPA is a Kubernetes component that automatically scales your service based on metrics such as CPU utilization or others, as defined through the Kubernetes metric server. The HPA scales the pods in either a deployment or replica set, and is implemented as a Kubernetes API resource and a controller. The Controller Manager queries the resource utilization against the metrics specified in each horizontal pod autoscaler definition. It obtains the metrics from either the resource metrics API for per pod metrics or the custom metrics API for any other metrics.
To see this in action, we are going to configure HPA and then apply some load to our system to see it in action.
To start with, let us start with installing Helm as a package manager for Kubernetes.
Now, we are going to set up the server base portion of Helm called Tiller. This requires a service account:
The above defines a Tiller service account to which we have assigned the cluster admin role. Now let's go ahead and apply the configuration:
helm init using the Tiller service account we have just created:
With this we have installed Tiller onto the cluster, which gives access to manage those resources within it.
With Helm installed, we can now deploy the metric server. Metric servers are cluster wide aggregators of resource usage data where metrics are collected by
kubelet on each worker node, and are used to dictate the scaling behavior of deployments.
So let's go ahead and install that now:
Once all checks have passed, we are ready to scale the application.
For the purpose of this article, we will deploy a special build of Apache and PHP designed to generate CPU utilization:
**requests=cpu=200m - requesting 200 millicores get allocated to pod
Now, let us autoscale our deployment:
The above specifies that the HPA will increase or decrease the number of replicas to maintain an average CPU utilization across all pods by 50%. Since each pod requests 200 millicores (as specified in the previous command), the average CPU utilization of 100 millicores is maintained.
Let's check the status:
Targets column, if it says
unknown/50% then it means that the current CPU consumption is 0%, as we are not currently sending any request to the server. This will take a couple of minutes to show the correct value, so let us grab a cup of coffee and come back when we have got some data here.
Rerun the last command and confirm that
Targets column is now
0%/50%. Now, let's generate some load in order to trigger scaling by running the following :
Inside this container, we are going to send an infinite number of requests to our service. If we flip back over to the other terminal, we can watch the autoscaler in action:
We can watch the HPA scaler pod up from 1 to our configured maximum of 10, until the average CPU utilization is below our target of 50%. It will take about 10 minutes to run and you could see we are now having 10 replicas. If we flip back to the other terminal to terminate the load test, and flip back to the scaler terminal, we can see the HPA reduce the replica count back to the minimum.
The Cluster Autoscaler is the default Kubernetes component that can scale either pods or nodes in a cluster. It automatically increases the size of an autoscaling group, so that pods can continue to get placed successfully. It also tries to remove unused worker nodes from the autoscaling group (the ones with no pods running).
The following AWS CLI command will create an Auto scaling group with minimum of one and maximum count of ten:
Now, we need to apply an inline IAM policy to our worker nodes:
This basically allows the EC2 worker nodes posting the cluster auto scaler the ability to manipulate auto scaling. Copy it and add to your EC2 IAM role.
Next, download the following file:
And update the following line with your cluster name:
Finally, we can deploy our Autoscaler:
Of course we should wait for the pods to finish creating. Once done, we can scale our cluster out. We will consider a simple
nginx application with the following
Let's go ahead and deploy the application:
And check the deployment:
Now, let's scale a replica up to 10:
We can see our some pods in the pending state, which is the trigger that the cluster auto scaler uses to scale out our fleet of EC2 instances.
In this article, we considered both types of EKS cluster autoscaling. We learned how the Cluster Autoscaler initiates scale-in and scale-out operations each time it detects under-utilized instances or pending pods. Horizontal Pod Autoscaler and Cluster Autoscaler are essential features of Kubernetes when it comes to scaling a microservice application. Hope you found this article useful but there is more to come. Till then, happy scaling!
Published at DZone with permission of Sudip Sengupta. See the original article here.
Opinions expressed by DZone contributors are their own.