DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Optimizing Prometheus and Grafana with the Prometheus Operator

Optimizing Prometheus and Grafana with the Prometheus Operator

Let's discuss cluster monitoring fundamentals and how we can use Prometheus Operator to deploy Prometheus and Grafana to monitor a Kubernetes cluster.

Kevin Taylor user avatar by
Kevin Taylor
·
Jul. 20, 21 · Tutorial
Like (3)
Save
Tweet
Share
4.18K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Taking a proactive and efficient approach to Kubernetes cluster monitoring can help engineering teams identify and predict many critical problems like CPU outage, memory outage, storage issues well in advance of these issues taking a toll on a business. Companies of all sizes such as enterprises like CERN monitor petabytes of their Kubernetes cluster data to understand all their cluster workloads. Solving critical problems before they have the chance to make too significant an impact saves money, time, and reputation. The task is a challenge though as proper cluster monitoring can be a pain point for many companies as it’s important to be aware of what exactly we want to monitor in a cluster.

This article will discuss cluster monitoring fundamentals and how we can use Prometheus Operator to deploy Prometheus and Grafana to monitor a Kubernetes cluster.

What Is Cluster Monitoring?

Cluster monitoring is the process of monitoring all the components and resources running on a cluster. With this process, you actively check the health of all your services and applications and set up monitoring systems to send alerts to administrators to immediately notify them about problems. We can monitor CPU utilization, memory utilization, numbers of namespaces/pods/deployments/services running on the cluster, and many more resources.

Tools for Cluster Monitoring–Prometheus & Grafana

Prometheus and Grafana are two very popular tool choices for cluster monitoring.

Prometheus is an open-source monitoring system that collects the cluster data by sending HTTP requests to the metrics endpoints of the various resources running on the cluster. Prometheus stores data in a time-series database for analysis and alerting purposes.

Prometheus does generate raw visualizations of the metrics it collects. However, the final data images are not necessarily easy to navigate and understand. Optimizing Grafana to work alongside Prometheus allows you to combine the best features of both tools together. Grafana provides excellent cluster and data visualization images, plus the tool integrates with Prometheus seamlessly and generates beautiful dashboards for the cluster data.

Business Advantages of Cluster Monitoring

Cluster monitoring is crucial for any organization whose applications run on clusters. Any problem with the cluster can lead to a huge loss to the organization. For example, Moonlight had a 100% traffic outage due to their Kubernetes cluster issues. 

Cluster monitoring:

  • Saves a lot of time and money for the organization by identifying critical issues in the cluster.
  • Helps in analyzing the cluster performance and measures critical information proactively.
  • Identifies and helps avoid any upcoming downtime due to bad cluster resource consumption.
  • Alerts the individual responsible in real-time about the problems in the cluster.
  • Can prevent or predict any massive issue which can bring down the cluster.
  • Maintains a proactive health check on all the deployments and services.

Use Cases of Cluster Monitoring

  • We can curate and visualize cluster data for a better understanding of the cluster by selecting the desired metrics we want to monitor.
  • Cluster monitoring dashboards are easily shareable with the teams to share cluster insights with them.
  • We can run ad-hoc queries on the cluster monitoring tool to explore the cluster data. We can also explore data in different time ranges, data sources, queries.
  • Exploring logs is a fundamental use case for cluster monitoring which every administrator must do daily. We can also explore log metrics to understand data in detail that might not be visible in dashboards.
  • We can write our own conditions to generate alerts via email, chat tools like slack, webhook, etc., for critical cases.

Monitoring with Prometheus Operator

We can use Prometheus Operator to manage Prometheus-based Kubernetes monitoring stack by implementing the Kubernetes operator pattern. These Kubernetes operators configure, manage, and optimize the deployment on a Kubernetes cluster automatically. Prometheus Operator uses four custom resource definitions (CRDs) – Prometheus, ServiceMonitor, PrometheusRule, Alertmanager to act on. As the advantages of using the operator pattern for deploying and configuring Prometheus, Grafana, and Alertmanager have become clear, several companies have also made this easier by packaging Prometheus Operator using Helm to make it easier to deploy and manage, for example:

  • The Prometheus Operator entry on operatorhub.io, originally written by the coreos team, and is now maintained by Red Hat 
  • The loki-stack helm charts created by the team at Grafana Labs can install the Prometheus Operator along with Promtail and Grafana Loki to give you a unified observability option for metrics-based monitoring as well as powerful consolidated and searchable access to your logs for your Kubernetes workloads.

Prometheus Operator also has a kube-prometheus repository which is a combination of Kubernetes manifests, Grafana dashboard templates, and pre-generated Prometheus rules which configure the Prometheus Operator to enabling monitoring, observability, and alerting for the Kubernetes Cluster itself. Kube Prometheus consists of the below packages in the monitoring stack:

  • The Prometheus Operator
  • Highly available Prometheus
  • Highly available Alertmanager
  • Prometheus node-exporter
  • Prometheus Adapter for Kubernetes Metrics APIs
  • kube-state-metrics
  • Grafana

Set-up Steps 

Now, we will monitor a Kubernetes cluster with Prometheus Operator and visualize the monitoring components in Grafana. But you must have an up and running Kubernetes cluster before following the steps shown below.

Step 1: Clone Kube Prometheus from Prometheus operator git repository.

Plain Text
 
ubuntu@ubuntu:~$ git clone https://github.com/prometheus-operator/kube-prometheus
Receiving objects: 100% (11526/11526), 5.89 MiB | 3.33 MiB/s, done.
Resolving deltas: 100% (7136/7136), done.


Step 2: Using the configs present in the manifest directory, create the monitoring stack. This will create a lot of CRDs and a namespace – “monitoring”.

Plain Text
 
ubuntu@ubuntu:~$ cd kube-prometheus
ubuntu@ubuntu:~/kube-prometheus$ kubectl create -f manifests/setup
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created

ubuntu@ubuntu:~/kube-prometheus$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
No resources found
ubuntu@ubuntu:~/kube-prometheus$ kubectl create -f manifests/


Step 3: Check all the resources created for monitoring namespace. We can see multiple pods, daemonsets, services are now running on the cluster.

Plain Text
 
ubuntu@ubuntu:~/kube-prometheus$ kubectl get all -n monitoring

NAME                                       READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running   0          3m35s
pod/alertmanager-main-1                    2/2     Running   0          3m35s
pod/grafana-665447c488-9snqs               1/1     Running   0          3m32s
pod/kube-state-metrics-6f4dfb9ffb-g4gb7    3/3     Running   0          3m32s
pod/prometheus-k8s-0                       2/2     Running   1          3m30s
pod/prometheus-k8s-1                       2/2     Running   2          3m30s
pod/prometheus-operator-764cb46c94-jdd28   2/2     Running   0          5m1s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main       ClusterIP   10.110.145.114   <none>        9093/TCP                     3m36s
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   3m35s
service/grafana                 ClusterIP   10.102.87.41     <none>        3000/TCP                     3m33s
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            3m33s
service/prometheus-operator     ClusterIP   None             <none>        8443/TCP                     5m2s

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               1/1     1            1           3m33s
deployment.apps/kube-state-metrics    1/1     1            1           3m33s
deployment.apps/prometheus-adapter    1/1     1            1           3m31s
deployment.apps/prometheus-operator   1/1     1            1           5m3s


Step 4: If we go to the Kubernetes dashboard, we can see all the namespaces and custom resource definitions present on the cluster.

Prometheus namespaces
Prometheus custom resource definitions

Step 5: Access the dashboard of Prometheus, Grafana using the below commands. Prometheus will be running on port 9090 and Grafana on 3000.  

Plain Text
 
ubuntu@ubuntu:~/kube-prometheus$ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
Forwarding from 127.0.0.1:9090 -> 9090


Prometheus expression
Plain Text
 
ubuntu@ubuntu:~/kube-prometheus$ kubectl --namespace monitoring port-forward svc/grafana 3000
Forwarding from 127.0.0.1:3000 -> 3000
welcome to Grafana


Step 6: Monitor the cluster components and resources using Grafana. Click on Manage.

Grafana dashboard


Select the Default folder; you will get plenty of cluster resources to monitor.  Choose the resources you want to monitor.

cluster resource monitoring

Finally, your cluster monitoring visualization will be ready. In this snapshot, the Grafana dashboard monitors the cluster compute resources such as CPU utilization, memory limits, etc.

cluster compute monitoring on Grafana

In this snapshot, the dashboard is monitoring the bandwidth used on the Kubernetes cluster.

bandwidth monitoring on Grafana

Conclusion

We hope this article helped you in understanding the importance of cluster monitoring and how Prometheus Operator can be the one-stop solution necessary to monitor your Kubernetes clusters with ease.

Kubernetes cluster Operator (extension) Grafana

Published at DZone with permission of Kevin Taylor. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Top 3 Challenges Facing Engineering Leaders Today—And How to Overcome Them
  • Mr. Over, the Engineer [Comic]
  • The Role of Data Governance in Data Strategy: Part II
  • Simulate Network Latency and Packet Drop In Linux

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: