Building a Simple AIOps Monitoring Dashboard With Prometheus and Grafana
Learn to build a simple AIOps dashboard using Prometheus, Grafana, and ML-based anomaly detection to monitor metrics, set alerts, and prevent failures.
Join the DZone community and get the full member experience.
Join For FreeMachine learning (ML) is being used by AIOps (Artificial Intelligence for IT Operations) to find problems, predict failures, and automate reactions. This is changing how businesses handle their IT environments.
This guide will show you how to make a simple monitoring dashboard that uses Prometheus to collect data and Grafana to demonstrate it. We'll also add some basic AIOps tools to the panel to make it better by adding anomaly detection, which will let you keep an eye on things before they go wrong.
Prerequisites
- Docker and Docker Compose installed on your machine
- Basic knowledge of monitoring metrics and alerting
- Prometheus and Grafana Docker images (both are available via Docker Hub)
Step 1: Set Up Prometheus for Metrics Collection
Prometheus is a robust open-source monitoring and alerting tool. At certain intervals, it gathers measurements from configured targets. Here's how to configure it:
Create a Prometheus Docker Container
First, create a Docker Compose file to simplify the deployment. Here’s the docker-compose.yml file for Prometheus:
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
Configure Prometheus
Next, configure Prometheus to collect metrics. Create a prometheus.yml file with the following configuration to scrape data from a simple target:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'simple_app'
static_configs:
- targets: ['host.docker.internal:8080']
Replace host.docker.internal:8080 with the address of your target application that exposes Prometheus-compatible metrics.
Start Prometheus
Run the following command to start Prometheus:
docker-compose up -d
You can access Prometheus at http://localhost:9090. To ensure Prometheus is collecting data, try querying a metric like up in the web UI.
Step 2: Set Up Grafana for Visualization
Now, let’s set up Grafana to visualize the data collected by Prometheus.
Create a Grafana Docker Container
Add the following service to your docker-compose.yml file to deploy Grafana:
grafana:
image: grafana/grafana
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
This will start Grafana on port 3000.
Start Grafana
Run the following command to start Grafana:
docker-compose up -d grafana
You can access Grafana at http://localhost:3000. Log in with the default username admin and the password admin (or the password you specified in the docker-compose.yml file).
Add Prometheus as a Data Source
- Go to Configuration > Data Sources.
- Select Prometheus and enter the URL http://prometheus:9090 (or the appropriate URL for your Prometheus container).
- Click Save & Test to verify the connection.
Create a Simple Dashboard
- Go to Create > Dashboard and add a new panel.
- Select Prometheus as the data source and query a metric like up or http_requests_total.
- Visualize the data using graphs or tables.
Step 3: Implement Basic Alerting
Prometheus offers robust alerting capabilities. You can define alert rules that notify you when certain conditions are met.
Define Alert Rules
Add the following section to your prometheus.yml file to create an alert rule that triggers if the application is down (i.e., if up is 0):
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- "alert.rules"
Create an Alert Rule File
Create an alert.rules file with the following content:
groups:
- name: example
rules:
- alert: ApplicationDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Application is down"
Configure Alert Notifications
To receive notifications, you can set up Alertmanager, which sends alerts to email, Slack, or other channels. However, for this simple tutorial, we’ll just observe alerts in the Prometheus web UI.
Step 4: Integrating Basic AIOps for Anomaly Detection
To enhance the dashboard with AIOps features, we can introduce simple anomaly detection using machine learning. Grafana’s Machine Learning plugin allows you to detect outliers in your metrics.
Install Grafana Machine Learning Plugin
In Grafana, go to Configuration > Plugins and search for Machine Learning. Install the plugin to use basic ML features like anomaly detection.
Use ML for Anomaly Detection
Once the plugin is installed, create a panel with Anomaly Detection as the visualization. This will help you automatically identify when a metric deviates from its expected range.
Step 5: Tips for Expanding the Dashboard
Add Node Exporter for System Metrics
Once you’ve successfully created your basic dashboard with Prometheus and Grafana, you can take it further by incorporating additional exporters and services.
For example, the Node Exporter is a useful tool that exposes system-level metrics like CPU usage, memory consumption, and disk I/O. Simply add the Node Exporter service to your docker-compose.yml file and update your prometheus.yml configuration to scrape metrics from it.
- job_name: 'node_exporter'
static_configs:
- targets: ['node-exporter:9100']
This allows you to visualize the health of your infrastructure alongside application metrics.
Using Alertmanager for Notifications
To make alerts actionable, configure Alertmanager to send notifications. You can integrate Alertmanager with various tools like Slack, PagerDuty, or even email. For instance, to set up Slack integration, define a slack_configs section in the alertmanager.yml configuration file.
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
username: 'alertmanager'
api_url: 'https://hooks.slack.com/services/your/webhook/url'
This setup ensures critical issues reach the right teams instantly, enabling faster incident response and closing the loop on your AIOps-driven observability stack.
Conclusion
In this tutorial, we'll create a simple AIOps monitoring dashboard with Prometheus for metrics gathering and Grafana for visualization. We improved the dashboard's AIOps capabilities by combining basic anomaly detection and machine learning, enabling automatic anomaly identification in real time. This dashboard serves as the cornerstone for proactive IT operations, allowing teams to identify issues early on and automate solutions.
Opinions expressed by DZone contributors are their own.
Comments