Best Practices, Tools, and Approaches for Kubernetes Monitoring
Let's look at some of the available Kubernetes monitoring and Kubernetes logging tools, including Prometheus for monitoring and Grafana for visualization and dashboards.
Join the DZone community and get the full member experience.Join For Free
In a Kubernetes environment, applications operate across multiple nodes within a cluster, and application services can be distributed across multiple clusters and multiple clouds, making tracking the health of an application and the infrastructure it depends on quite challenging.
Kubernetes monitoring is the process of gathering metrics from the Kubernetes clusters you operate to identify critical events and ensure that all hardware, software, and applications are operating as expected. Aggregating metrics in a central location will help you understand and protect the health of your entire Kubernetes fleet and the applications and services running on it.
Between the layers of abstraction created by containerization and Kubernetes, and the dynamic nature of applications running in a K8s environment, monitoring everything can be a challenge. Fortunately a number of open-source Kubernetes monitoring tools — as well as popular commercial tools — exist to make monitoring easier.
This post examines some of the available Kubernetes monitoring and Kubernetes logging tools, including Prometheus for monitoring and Grafana for visualization and dashboards.
Kubernetes Ecosystem Tools for Logging and Monitoring
There are a variety of popular tools that can enhance your Kubernetes container monitoring efforts. Some of the most common ones include:
- Prometheus: An open-source event monitoring and alerting tool that collects and stores metrics as time series data. Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project after Kubernetes.
- Grafana: A fully managed visualization platform for applications and infrastructure that works with monitoring software such as Prometheus. Grafana provides capabilities to collect, store, visualize, and alert on data.
- Thanos: A metric system that provides a simple and cost-effective way to centralize and scale Prometheus-based monitoring systems.
- Elasticsearch: A distributed, JSON-based search and analytics engine.
- Logstash: An open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite stash.
- Kibana: A data visualization and exploration tool used for log and time series analytics, application monitoring, and operational intelligence use cases.
Which Kubernetes Monitoring Tools Should You Choose?
Many teams use these monitoring and logging tools alone or in combination to create their own solutions and address specific container monitoring and Kubernetes application monitoring needs. One of the most commonly used combinations is Prometheus plus Grafana. Prometheus enables you to gather time series data from both hardware and software sources, while Grafana lets you visualize the data that Prometheus collects.
Another popular combination is Elasticsearch plus Logstash plus Kibana, often referred to as ELK stack or Elastic Stack, and all available through Elastic. While Elastic is itself a for-profit company, these components are free and open-source.
Implementing any of the above tools, whether singly or in combination, necessarily creates a certain amount of complexity, especially as your Kubernetes fleet grows to include many clusters —potentially running different K8s distributions in different cloud environments.
Managing a Prometheus config at scale may become a challenge due to app onboarding issues, manual configuration requirements, and configuration drift. While Prometheus and Grafana work well together for individual clusters, in multi-cluster environments you may have to add Thanos to your toolset to aggregate data and provide long-term storage and a global view. Still you may face limitations with data retention and HA that cause some to prefer ELK stack.
Because of this complexity, many organizations prefer monitoring as a service using commercial solutions such as Datadog, Cloudwatch, and New Relic.
Published at DZone with permission of Kyle Hunter. See the original article here.
Opinions expressed by DZone contributors are their own.