Kubernetes Autoscaling with Custom Metrics (Updated)
In this post, we'll do a walkthrough of how Kubernetes autoscaling can be implemented for custom metrics generated by the application.
Join the DZone community and get the full member experience.Join For Free
In the previous blog post about Kubernetes autoscaling, we looked at different concepts and terminologies related to autoscaling such as HPA, cluster auto-scaler, etc. In this post, we'll do a walkthrough of how Kubernetes autoscaling can be implemented for custom metrics generated by the application.
Why Custom Metrics?
The CPU or RAM consumption of an application may not indicate the right metric for scaling always. For example, if you have a message queue consumer that can handle 500 messages per second without crashing. Once a single instance of this consumer is handling close to 500 messages per second, you may want to scale the application to two instances so that load is distributed across two instances. Measuring CPU or RAM is a fundamentally flawed approach for scaling such an application and you would have to look at a metric that relates more closely to the application's nature. The number of messages that an instance is processing at a given point in time is a better indicator of the actual load on that application. Similarly, there might be applications where other metrics make more sense and these can be defined using custom metrics in Kubernetes.
Metrics Server and API
Originally the metrics were exposed to users through Heapster which queried the metrics from each of Kubelet. The Kubelet, in turn, talked to cAdvisor on localhost and retrieved the node level and pod level metrics. The metrics-server was introduced to replace Heapster and use the Kubernetes API to expose the metrics so that the metrics are available in the same manner in which Kubernetes API is available. The metrics server aims to provides only the core metrics such as memory and CPU of pods and nodes and for all other metrics, you need to build the full metrics pipeline. The mechanisms for building the pipeline and Kubernetes autoscaling remain the same, as we will see in detail in the next few sections.
One of the key pieces which enable exposing the metrics via the Kubernetes API layer is the aggregation layer. The aggregation layer allows installing additional APIs which are Kubernetes style into the cluster. This makes the API available like any Kubernetes resource API but the actual serving of the API can be done an external service which could be a pod deployed into the cluster itself (You need to enable the aggregation layer if not done already at cluster level as documented here). So how does this work under the hood? As a user, the user needs to provide the API provider (Let's say a pod running an API service) and then register the same using an APIService object.
Let's take an example of the core metrics pipeline and how the metrics server registers itself using the API aggregation layer. The
APIService object looks like this:
After deploying the metrics server registering the API using APIService, we can see that the metrics API is available within Kubernetes API:
Metrics Pipeline: Core and Full
Having understood the basic components, let's put them together to make a core metrics pipeline. In the case of the core pipeline, if you have the metrics server installed properly, it will also create the APIService to register itself on to Kubernetes API server. The metrics will be exposed at
/apis/metrics.k8s.io as we saw in the previous section and will be used by HPA.
Most non-trivial applications need more metrics than just memory and CPU and that is why most organizations use a monitoring tool. Some of the most commonly used monitoring tools are Prometheus, Datadog, Sysdig, etc. The format which these monitoring tools use may vary from tool to tool. Before we can expose the endpoint using Kubernetes API aggregation we need to convert the metrics to a suitable format. That is where a small adapter — which may be part of the monitoring tool or may be available as a separate component bridges the gap between the monitoring tool and the Kubernetes API. For example, Prometheus has the Prometheus adapter or Datadog has Datadog Cluster Agent — they sit between the monitoring tool and the API and translate from one format to another as shown in the diagram below. These metrics are available at a slightly different endpoint so that they can be consumed appropriately.
Demo: Kubernetes Autoscaling
We will demonstrate using custom metrics to autoscale an application with Prometheus and Prometheus adapter using custom metrics. You can read through the rest of the post or straight away head to the GitHub repo and start building the demo on your own
Setting up Prometheus
In order to make the metrics available to the adapter, we will install Prometheus with the help of the Prometheus Operator. It creates CustomResourceDefinitions to deploy components of Prometheus in the cluster. CRD is a way to extend Kubernetes' resources. Using Operators makes it easy to configure and maintain Prometheus instances 'the Kubernetes way' (by defining objects in YAML files). CRDs created by Prometheus Operator are:
You can follow the instructions mentioned here to setup Prometheus.
Deploying a Demo Application
To generate the metrics, we will deploy a simple application mockmetrics which generates total_hit_count value at
/metrics. It's a webserver written in Go. The value of the metric
total_hit_count keeps on increasing when the URL is visited. It uses the exposition format required by Prometheus, to display the metrics.
Follow the instructions to create a deployment and service for this application, which will also create ServiceMonitor and HPA for the application.
ServiceMonitor creates a configuration for Prometheus. It mentions the label of the service, path, port, and interval at which the metrics should be scraped. With the help of the label of service, pods are selected. Prometheus scrapes metrics from all the matching pods. Depending on your Prometheus configuration, the ServiceMonitor should be placed in the correct namespace. In our case, it's in the same namespace as mockmetrics.
Deploying and Configuring Prometheus Adapter
Now to provide
custom.metrics.k8s.io API endpoint for HPA, we will deploy Prometheus adapter. The adapter expects its config file to be available inside the pod. We will create a configMap and mount it inside the pod. We will also create Service and APIService to create the API. APIService adds the
/api/custom.metrics.k8s.io/v1beta1 endpoint to the standard Kubernetes APIs. You can follow the instructions to achieve this. Let's take a look at the configuration:
seriesQueryis used to query Prometheus for the resources with labels 'default' and 'mockmetrics-service'.
resourcessection mentions how the labels should be mapped to the Kubernetes resources. In our case, it maps the label 'namespace' with the Kubernetes 'namespace' and similar for service.
metricsQueryis again the Prometheus query which does the work of getting the metrics into the adapter. The query we are using gets the average sum of
total_hit_countfrom all pods matching the regex
mockmetrics-deploy-(.*)over 2 minutes.
Kubernetes Autoscaling in action
Once you follow the steps, metrics value will keep increasing. Let's take a look at HPA now.
You can see how the number of replicas increases when the value reaches the target value.
The overall flow of this autoscaling is explained in the diagram below.
Image credits: luxas/kubeadm-workshop, Licensed under MIT License.
You can learn more about relevant projects and references here. The monitoring pipeline in Kubernetes has evolved a lot in the last few releases and Kubernetes autoscaling largely works based on it. If you are not familiar with the landscape it is easy to get confused and lost.
Published at DZone with permission of Bhavin Gandhi. See the original article here.
Opinions expressed by DZone contributors are their own.