DORA Metrics: Tracking and Observability With Jenkins, Prometheus, and Observe
DORA metrics give you powerful insights into software delivery and the right observability tools will supercharge your software delivery.
Join the DZone community and get the full member experience.
Join For FreeDORA (DevOps Research and Assessment) metrics, developed by the DORA team have become a standard for measuring the efficiency and effectiveness of DevOps implementations. As organizations start to adopt DevOps practices to accelerate software delivery, tracking performance and reliability becomes critical. DORA metrics help organizations address these critical tasks by providing a framework for understanding how well teams are delivering software and how quickly they can recover from failures. This article will delve into DORA metrics, demonstrate how to track them using Jenkins, and explore how to use Prometheus for collecting and displaying these metrics in Observe.
What Are DORA Metrics?
DORA metrics are a set of four key performance indicators (KPIs) that help organizations evaluate their software delivery performance. These metrics are:
- Deployment Frequency (DF): Measures how often code is deployed to production
- Lead Time for Changes (LT): Time taken from code commit to production deployment
- Change Failure Rate (CFR): The percentage of changes failed in production
- Mean Time to Restore (MTTR): The average time it takes to recover from a failure in production
These metrics are valuable because they provide actionable insights into software development and deployment practices. High-performing teams tend to deploy more frequently and have shorter lead times, lower failure rates, and quicker recovery times, leading to more resilient and robust applications.
Tracking DORA Metrics in Jenkins
Jenkins is a widely used automation server to enable continuous integration and delivery (CI/CD). Below is an example of how to track DORA metrics using a Jenkins pipeline, using shell commands and scripts to log deployment frequency, calculate lead time for changes, monitor change failure rate, and determine the mean time to restore.
pipeline {
agent any
environment {
DEPLOY_LOG = 'deploy.log'
FAIL_LOG = 'fail.log'
}
//Build Application
stages {
stage('Build') {
steps {
echo 'Building the application...'
// Run required build commands
sh 'make build'
}
}
// Test Application
stage('Test') {
steps {
echo 'Running tests...'
// run required test commands
sh 'make test'
}
}
// Deploy application
stage('Deploy') {
steps {
echo 'Deploying the application...'
// run the deployment steps
sh 'make deploy'
// Log the deployment into log file to compute deployment frequency
sh "echo $(date '+%F_%T') >> ${DEPLOY_LOG}"
}
}
}
post {
always {
script {
// Computing deployment frequency (DF)
def deploymentCount = sh(script: "wc -l < ${DEPLOY_LOG}", returnStdout: true).trim()
echo "# of Deployments: ${deploymentCount}"
// Writing build failures into log for computing CFR
if (currentBuild.result == 'FAILURE') {
sh "echo $(date '+%F_%T') >> ${FAIL_LOG}"
}
// Computing Change Failure Rate (CFR)
def failureCount = sh(script: "wc -l < ${FAIL_LOG}", returnStdout: true).trim()
def CFR = (failureCount.toInteger() * 100) / deploymentCount.toInteger()
echo "Change Failure Rate: ${CFR}%"
// Computing Lead Time for Changes(LTC) using last commit and deploy times
def commitTime = sh(script: "git log -1 --pretty=format:'%ct'", returnStdout: true).trim()
def currentTime = sh(script: "date +%s", returnStdout: true).trim()
def leadTime = (currentTime.toLong() - commitTime.toLong()) / 3600
echo "Lead Time for Changes: ${leadTime} hours"
}
}
//End if pipeline
success {
echo 'Deployment Successful!'
}
failure {
echo 'Deployment failed!'
// Failure handling
}
}
}
In the above script, each deployment is logged as a timestamp in the deploy file, which can be used to determine the deployment frequency as you go. Similarly, failures are logged as timestamps in the fail log file and both counts are used to compute change failure rate. Additionally, the time difference between the last commit time and the current time provides the lead time for changes.
Monitoring DORA Metrics With Prometheus and Observe
Prometheus is an open-source monitoring and alerting toolkit commonly used for collecting metrics from applications. Combined with Observe, a modern observability platform, Prometheus can be used to visualize and monitor DORA metrics in real-time.
-
Install Prometheus on server: Download and install Prometheus from the link.
-
Configure Prometheus: Set up the prometheus.yml configuration file to define the metrics to be collected and time intervals. Example configuration:
YAML#setting time interval at which metrics are collected global: scrape_interval: 30s #Configuring Prometheus to collect metics from Jenkins on specific port scrape_configs: - job_name: 'jenkins' static_configs: - targets: ['<JENKINS_SERVER>:<PORT>']
- Expose Metrics in Jenkins: You can use either the Prometheus plugin for Jenkins or a custom script to expose metrics in a format that Prometheus can use to collect. Example Python script:
from prometheus_client import start_http_server, Gauge
import random
import time
# Creating Prometheus metrics gauges for the four DORA KPIs
DF = Gauge('Deployment Frequency', 'No. of deployments in a day')
LT = Gauge('Lead Time For Changes', 'Average lead time for changes in hours')
CFR = Gauge('Change Failure Rate', 'Percentage of changes failures in production')
MTTR = Gauge('Mean Time To Restore', 'Mean time to restore service after failure in minutes')
#Start server
start_http_server(8000)
#Sending random values to generate sample metrics to test
while True:
DF.set(random.randint(1, 9))
LT.set(random.uniform(1, 18))
CFR.set(random.uniform(0, 27))
MTTR.set(random.uniform(1, 45))
#Sleep for 30s
time.sleep(30)
Save this script on the server where Jenkins is running and run it to expose the metrics on port 8000.
- Add Prometheus Data Source to Observe: Observe is a monitoring and observability tool that provides advanced features for monitoring, analyzing, and visualizing observability data. In Observe, you can add Prometheus as a data source by navigating to the integrations section and configuring Prometheus with the appropriate endpoint URL.
- Set up Dashboards in Observe, and create dashboards with widgets to display graphs for these different metrics.
- Set up monitoring to configure alerts on set thresholds and analyze trends and patterns by drilling down into specific metrics.
Conclusion
DORA metrics are essential for assessing the performance and efficiency of DevOps practices. By implementing tracking in Jenkins pipelines and leveraging monitoring tools like Prometheus and Observe, organizations can gain deep insights into their software delivery processes. These metrics help teams continuously improve, making data-driven decisions that enhance deployment frequency, reduce lead time, minimize failures, and accelerate recovery. Adopting a robust observability strategy ensures that these metrics are visible to stakeholders, fostering a culture of transparency and continuous improvement in software development and delivery.
Opinions expressed by DZone contributors are their own.
Comments