CPU-Based Pod Auto-Scaling in Kubernetes

This article demonstrates and explains how different types of autoscaling work and goes in-depth with horizontal autoscaling with visualizations.

Anushiya Thevapalan

Malith Jayasinghe

Aathman Tharmasanthiran

Updated Jan. 21, 20 · Tutorial

Likes (6)

Comment

Save

3.2K Views

You know what that is? Growth.

The auto-scaling allows us to optimally allocate the resource to an application based its current resource consumption.

Kubernetes offers three main types of autoscaling:

Horizontal Pod Autoscaler (HPA): HPA controls the number of pods
Vertical Pod Autoscaler (VPA): VPA controls the resources in individual pods
Cluster Autoscaler (CA): CA controls the number of nodes in a cluster

Horizontal Pod Autoscaler (HPA)

In this article, we will focus on HPA. By default, Kubernetes supports CPU-based and memory-based pod auto-scaling. You can, however, configure it to scale based on a custom metric or multiple metrics as well.

Horizontal pod scaling

HPA continuously checks metric threshold values you configure (note: the default HPA check interval is 15 seconds and this can be altered as per our requirement using the --horizontal-pod-auto-scaler-sync-period flag) and if the current threshold is higher than the specified threshold HPA attempts to increase the number of pods. HPA controller assumes a linear relationship between the metric and the number of pods. It operates on the ratio between the desired metric value and the current metric value. The formula used to compute the desired replicas is as follows (refer to the K8s documentation for more details).

Compute desired replicas

In this article, we will focus on horizontal pod autoscaling and we will scale based on the CPU utilization (which is one of the most commonly used metrics). Note that higher CPU utilization indicates higher latency. Therefore, maintaining the CPU utilization at lower levels allows us to maintain the latency (of the application) at lower levels as well. The following figure shows the variation of the CPU utilization of an (I/O bound) microservice.

You may also enjoy: Microservices Architecture: Introduction to Auto Scaling

Variation of CPU utilization

Deploying the App

Let us now deploy a microservice in Kubernetes and study the performance behaviors with auto-scaling enabled. We will deploy a Spring Boot microservice (see here for the Github repo) in K8s. The following is the Kubernetes YAML file for the deployment.

     YAML 
   
xxxxxxxxxx
 
#deploymet/app/app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
 name: springboot-app
 labels:
   app: springboot-app
spec:
 replicas: 1
 selector:
   matchLabels:
     app: springboot-app
 template:
   metadata:
     labels:
       app: springboot-app
   spec:
     containers:
     - name: springboot-app
       image: anushiya/app:latest
       resources:
           limits:
               cpu: "100m"
           requests:
               cpu: "100m"
       ports:
       - containerPort: 9000

Configuring the Horizontal Pod Autoscaler

Let us now enable Horizontal Pod Autoscaling for the deployment created above. We configure the HPA to scale based on CPU utilization. The YAML file is shown below.

     YAML 
   
xxxxxxxxxx
 
#springboot-app-hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: springboot-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v2beta2
    kind: Deployment
    name: springboot-app
  minReplicas: 1
  maxReplicas: 20
  metrics:
    - resource:
        name: cpu
        targetAverageUtilization: 50
      type: Resource

HPA is also an API resource in the Kubernetes, with apiVersion, kind, metadata and spec fields (refer to K8s documentation for more details). The scaling condition is defined by resource: targetAverageUtilization. Here we specify a value of 50. This means that if the CPU utilization exceeds the given value, the scaling process starts. The value should be in between 1 and 100.

Deploying JMeter

To test the performance of the application, we use JMeter as the load testing client. To deploy the JMeter we created a Docker image. The following is the Dockerfile for the JMeter. The files used can be found in this repo.

     Shell 
   
xxxxxxxxxx
 
FROM anushiya/jmeter-plugins:v1
ADD bash /home/kubernetes-performance/bash
ADD jar /home/kubernetes-performance/jar
ADD jmx /home/kubernetes-performance/jmx
ADD python /home/kubernetes-performance/python
WORKDIR /home/kubernetes-performance/bash
RUN chmod +x start_performance_test.sh
RUN apt-get update && apt-get install python3.5 -y
RUN apt-get install python-pip -y
RUN pip install numpy requests schedule

Since we want to store the performance test results permanently we use host volume to store the results of the tests performed. To create a host volume ssh into any of the nodes and create a directory to mount.

     Shell 
   
xxxxxxxxxx
 
#Get the list of nodes
kubectl get node
#Select a node and ssh into it
sudo gcloud beta compute --project "[name of the project]" ssh --zone "[zone]" "[name of the node]"
#example
sudo gcloud beta compute --project "performance-testing" ssh --zone "us-central1-a" "gke-performance-testing-default-pool-b6e4d476-78zn"
#Create a directory to mount
sudo mkdir /mnt/data/results

Create a persistent volume.

     YAML 
   
xxxxxxxxxx
 
#pv-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
 name: pv-volume
 labels:
   type: local
spec:
 storageClassName: manual
 capacity:
   storage: 10Gi
 accessModes:
   - ReadWriteOnce
 hostPath:
   path: "/mnt/data/results"

Create a persistent volume claim.

     YAML 
   
xxxxxxxxxx
 
#deployment/volume/pv-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: pv-claim
spec:
 storageClassName: manual
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 6Gi

Apply the YAML files to create persistent volume and persistent volume claim

     Shell 
   
xxxxxxxxxx
 
#create persistent volume
kubectl apply -f deployment/volume/pv-volume.yaml
#create persistent volume claim
kubectl apply -f deployment/volume/pv-claim.yaml

For more details about PersistentVolume and PersistenetVolumeClaim see this. Now that we have created volumes to store the test results, we'll move on to creating a Job to perform the tests. The test results can be found in the directory specified above.

     YAML 
   
x
 
#perf-test.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: perf-test
spec:
  template:
    spec:
      containers:
      - name: perf-test
        image: anushiya/perf-test:v1
        imagePullPolicy: Always
        command: ["bash", "start_performance_test.sh"]
        volumeMounts:
        - mountPath: "/home/kubernetes-performance/results"
          name: pv-storage
      restartPolicy: Never 
      volumes:
      - name: pv-storage
        persistentVolumeClaim:
          claimName: pv-claim
  backoffLimit: 4

Analyzing the Behavior of CPU Utilization, Latency and Pod Count

Let us now take a look at how the CPU, pod count, and latency vary with time. The following figures show the variation in CPU utilization, pod count and the latency when we test the performance using a single concurrency user. We have used Stackdriver Monitoring API to get the performance statistics (see this link for more details).

CPU utilization vs time

Response time

Number of pods vs time

Initially, the pod count =1 (this is the minimum number of pods specified) in the HorizontalPodAutoscaler configuration. As the load increases, the pod count increases to 20, to maintain the CPU utilization below 50%. When we increase the number of concurrent users to 100, we notice the following behaviour.

CPU utilization vs time

Response tim

Pods vs time, concurrency 10

CPU-Based Pod Auto-Scaling in Kubernetes

This article demonstrates and explains how different types of autoscaling work and goes in-depth with horizontal autoscaling with visualizations.

Horizontal Pod Autoscaler (HPA)

Deploying the App

Configuring the Horizontal Pod Autoscaler

Deploying JMeter

Analyzing the Behavior of CPU Utilization, Latency and Pod Count

Further Reading

Partner Resources

Related

Trending

CPU-Based Pod Auto-Scaling in Kubernetes

This article demonstrates and explains how different types of autoscaling work and goes in-depth with horizontal autoscaling with visualizations.

Horizontal Pod Autoscaler (HPA)

Deploying the App

Configuring the Horizontal Pod Autoscaler

Deploying JMeter

Analyzing the Behavior of CPU Utilization, Latency and Pod Count

Further Reading

Related

Partner Resources