DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Production-Ready Kubernetes Service Checklist
  • Chaos Engineering With Litmus: A CNCF Incubating Project
  • Optimizing Prometheus Queries With PromQL
  • Troubleshooting Kubernetes Pod Crashes: Common Causes and Effective Solutions

Trending

  • Performance Optimization Techniques for Snowflake on AWS
  • Build an MCP Server Using Go to Connect AI Agents With Databases
  • Accelerating AI Inference With TensorRT
  • Teradata Performance and Skew Prevention Tips
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. CPU-Based Pod Auto-Scaling in Kubernetes

CPU-Based Pod Auto-Scaling in Kubernetes

This article demonstrates and explains how different types of autoscaling work and goes in-depth with horizontal autoscaling with visualizations.

By 
Anushiya Thevapalan user avatar
Anushiya Thevapalan
·
Malith Jayasinghe user avatar
Malith Jayasinghe
·
Aathman Tharmasanthiran user avatar
Aathman Tharmasanthiran
·
Updated Jan. 21, 20 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
3.2K Views

Join the DZone community and get the full member experience.

Join For Free

You know what that is? Growth.


The auto-scaling allows us to optimally allocate the resource to an application based its current resource consumption.

Kubernetes offers three main types of autoscaling:

  1. Horizontal Pod Autoscaler (HPA): HPA controls the number of pods
  2. Vertical Pod Autoscaler (VPA): VPA controls the resources in individual pods
  3. Cluster Autoscaler (CA): CA controls the number of nodes in a cluster

Horizontal Pod Autoscaler (HPA)

In this article, we will focus on HPA. By default, Kubernetes supports CPU-based and memory-based pod auto-scaling. You can, however, configure it to scale based on a custom metric or multiple metrics as well.  Horizontal pod scaling
Horizontal pod scaling
HPA continuously checks metric threshold values you configure (note: the default HPA check interval is 15 seconds and this can be altered as per our requirement using the  --horizontal-pod-auto-scaler-sync-period  flag) and if the current threshold is higher than the specified threshold 
HPA attempts to increase the number of pods. HPA controller assumes a linear relationship between the metric and the number of pods. It operates on the ratio between the desired metric value and the current metric value. The formula used to compute the desired replicas is as follows (refer to the K8s documentation for more details).

Compute desired replicas

Compute desired replicas


In this article, we will focus on horizontal pod autoscaling and we will scale based on the CPU utilization (which is one of the most commonly used metrics). Note that higher CPU utilization indicates higher latency. Therefore, maintaining the CPU utilization at lower levels allows us to maintain the latency (of the application) at lower levels as well. The following figure shows the variation of the CPU utilization of an (I/O bound) microservice. 

You may also enjoy: Microservices Architecture: Introduction to Auto Scaling

Variation of CPU utilization

Variation of CPU utilization



Deploying the App

Let us now deploy a microservice in Kubernetes and study the performance behaviors with auto-scaling enabled. We will deploy a Spring Boot microservice (see here for the Github repo) in K8s. The following is the Kubernetes YAML file for the deployment.

YAML
 




xxxxxxxxxx
1
28


 
1
#deploymet/app/app.yaml
2
 
          
3
apiVersion: apps/v1
4
kind: Deployment
5
metadata:
6
 name: springboot-app
7
 labels:
8
   app: springboot-app
9
spec:
10
 replicas: 1
11
 selector:
12
   matchLabels:
13
     app: springboot-app
14
 template:
15
   metadata:
16
     labels:
17
       app: springboot-app
18
   spec:
19
     containers:
20
     - name: springboot-app
21
       image: anushiya/app:latest
22
       resources:
23
           limits:
24
               cpu: "100m"
25
           requests:
26
               cpu: "100m"
27
       ports:
28
       - containerPort: 9000


Configuring the Horizontal Pod Autoscaler

Let us now enable Horizontal Pod Autoscaling for the deployment created above. We configure the HPA to scale based on CPU utilization. The YAML file is shown below.

YAML
 




xxxxxxxxxx
1
17


 
1
#springboot-app-hpa.yaml
2
apiVersion: autoscaling/v2beta2
3
kind: HorizontalPodAutoscaler
4
metadata:
5
  name: springboot-app-hpa
6
spec:
7
  scaleTargetRef:
8
    apiVersion: apps/v2beta2
9
    kind: Deployment
10
    name: springboot-app
11
  minReplicas: 1
12
  maxReplicas: 20
13
  metrics:
14
    - resource:
15
        name: cpu
16
        targetAverageUtilization: 50
17
      type: Resource


HPA is also an API resource in the Kubernetes, with apiVersion, kind, metadata and spec fields (refer to K8s documentation for more details). The scaling condition is defined by  resource: targetAverageUtilization. Here we specify a value of 50. This means that if the CPU utilization exceeds the given value, the scaling process starts. The value should be in between 1 and 100.

Deploying JMeter

To test the performance of the application, we use JMeter as the load testing client. To deploy the JMeter we created a Docker image. The following is the Dockerfile for the JMeter. The files used can be found in this repo.

Shell
 




xxxxxxxxxx
1
14


 
1
FROM anushiya/jmeter-plugins:v1
2
 
          
3
ADD bash /home/kubernetes-performance/bash
4
ADD jar /home/kubernetes-performance/jar
5
ADD jmx /home/kubernetes-performance/jmx
6
ADD python /home/kubernetes-performance/python
7
 
          
8
WORKDIR /home/kubernetes-performance/bash
9
 
          
10
RUN chmod +x start_performance_test.sh
11
 
          
12
RUN apt-get update && apt-get install python3.5 -y
13
RUN apt-get install python-pip -y
14
RUN pip install numpy requests schedule


Since we want to store the performance test results permanently we use host volume to store the results of the tests performed. To create a host volume ssh into any of the nodes and create a directory to mount.

Shell
 




xxxxxxxxxx
1
11


 
1
#Get the list of nodes
2
kubectl get node
3
 
          
4
#Select a node and ssh into it
5
sudo gcloud beta compute --project "[name of the project]" ssh --zone "[zone]" "[name of the node]"
6
 
          
7
#example
8
sudo gcloud beta compute --project "performance-testing" ssh --zone "us-central1-a" "gke-performance-testing-default-pool-b6e4d476-78zn"
9
 
          
10
#Create a directory to mount
11
sudo mkdir /mnt/data/results


Create a persistent volume.

YAML
 




xxxxxxxxxx
1
15


 
1
#pv-volume.yaml
2
apiVersion: v1
3
kind: PersistentVolume
4
metadata:
5
 name: pv-volume
6
 labels:
7
   type: local
8
spec:
9
 storageClassName: manual
10
 capacity:
11
   storage: 10Gi
12
 accessModes:
13
   - ReadWriteOnce
14
 hostPath:
15
   path: "/mnt/data/results"


Create a persistent volume claim.

YAML
 




xxxxxxxxxx
1
12


 
1
#deployment/volume/pv-claim.yaml
2
apiVersion: v1
3
kind: PersistentVolumeClaim
4
metadata:
5
 name: pv-claim
6
spec:
7
 storageClassName: manual
8
 accessModes:
9
   - ReadWriteOnce
10
 resources:
11
   requests:
12
     storage: 6Gi




Apply the YAML files to create persistent volume and persistent volume claim

Shell
 




xxxxxxxxxx
1


 
1
#create persistent volume
2
kubectl apply -f deployment/volume/pv-volume.yaml
3
 
          
4
#create persistent volume claim
5
kubectl apply -f deployment/volume/pv-claim.yaml


For more details about PersistentVolume and PersistenetVolumeClaim see this. Now that we have created volumes to store the test results, we'll move on to creating a Job to perform the tests. The test results can be found in the directory specified above.

YAML
 




x
22


 
1
#perf-test.yaml
2
apiVersion: batch/v1
3
kind: Job
4
metadata:
5
  name: perf-test
6
spec:
7
  template:
8
    spec:
9
      containers:
10
      - name: perf-test
11
        image: anushiya/perf-test:v1
12
        imagePullPolicy: Always
13
        command: ["bash", "start_performance_test.sh"]
14
        volumeMounts:
15
        - mountPath: "/home/kubernetes-performance/results"
16
          name: pv-storage
17
      restartPolicy: Never 
18
      volumes:
19
      - name: pv-storage
20
        persistentVolumeClaim:
21
          claimName: pv-claim
22
  backoffLimit: 4


Analyzing the Behavior of CPU Utilization, Latency and Pod Count

Let us now take a look at how the CPU, pod count, and latency vary with time. The following figures show the variation in CPU utilization, pod count and the latency when we test the performance using a single concurrency user. We have used Stackdriver Monitoring API to get the performance statistics (see this link for more details).
CPU utilization vs time


CPU utilization vs time



Response time

Response time


Number of pods vs time
Number of pods vs time
Initially, the pod count =1 (this is the minimum number of pods specified) in the HorizontalPodAutoscaler configuration. As the load increases, the pod count increases to 20, to maintain the CPU utilization below 50%.  When we increase the number of concurrent users to 100, we notice the following behaviour. 


CPU utilization vs time

CPU utilization vs time




Response time

Response tim



Pods vs time, concurrency 10
Pods vs time, concurrency 10

Further Reading

Vertical Scaling and Horizontal Scaling in AWS

Reasons to Scale Horizontally



pods Kubernetes

Opinions expressed by DZone contributors are their own.

Related

  • The Production-Ready Kubernetes Service Checklist
  • Chaos Engineering With Litmus: A CNCF Incubating Project
  • Optimizing Prometheus Queries With PromQL
  • Troubleshooting Kubernetes Pod Crashes: Common Causes and Effective Solutions

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!