DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • CI/CD With Azure DevOps and Alibaba Cloud Kubernetes (ACK)
  • Using Azure DevOps Pipeline With Gopaddle for No-Code Kubernetes Deployments
  • Docker Hub Rate Limits to Limitless DevOps in the Cloud
  • DevOps Fast Forward with Go

Trending

  • Tracing Stratoshark’s Roots: From Packet Capture to System Call Analysis
  • The Death of REST? Why gRPC and GraphQL Are Taking Over
  • Jakarta EE 11 and the Road Ahead With Jakarta EE 12
  • Modernizing Apache Spark Applications With GenAI: Migrating From Java to Scala
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. DevOps and CI/CD
  4. Advanced Argo Rollouts With Datadog Metrics for Progressive Delivery

Advanced Argo Rollouts With Datadog Metrics for Progressive Delivery

The article defines and explores how progressive delivery in Kubernetes environments can be enhanced using Argo Rollouts in combination with Datadog metrics.

By 
Karthik Bojja user avatar
Karthik Bojja
·
Jun. 27, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
2.0K Views

Join the DZone community and get the full member experience.

Join For Free

In modern DevOps environments, delivering software quickly and reliably is essential. Progressive delivery strategies such as canary deployments have emerged as effective methods to reduce risk during application updates. Argo Rollouts is a Kubernetes-native controller that enables progressive delivery using deployment strategies like canary and blue-green. When integrated with Datadog, a powerful monitoring and observability platform, Argo Rollouts can automatically make deployment decisions based on real-time metrics. This paper explores how Argo Rollouts and Datadog work together to automate analysis, reduce manual intervention, and ensure safe, data-driven deployments in Kubernetes environments.

Introduction

As organizations adopt microservices and cloud-native architectures, the complexity of application deployments has increased significantly. Traditional deployment methods often lead to downtime, user disruption, or production issues due to the lack of real-time feedback. Progressive delivery offers a solution by incrementally rolling out changes and continuously validating them.

Argo Rollouts is an open-source Kubernetes controller that supports progressive delivery through strategies like canary and blue-green deployments. It allows teams to control how new versions are gradually introduced to users while monitoring their impact. By integrating with Datadog, teams gain the ability to evaluate deployment success or failure using live metrics such as error rates, response times, and system health indicators.

This paper discusses the architecture, configuration, and benefits of integrating Argo Rollouts with Datadog to enable intelligent deployment automation. It also highlights a real-world use case where this integration supports automatic rollbacks and promotions, ensuring deployments are both resilient and efficient.

Modern Deployment Patterns: Canary, Blue-Green, and Beyond

Argo Rollouts is a Kubernetes controller and a set of Custom Resource Definitions (CRDs) that support advanced deployment strategies, including:

  • Canary Releases: Gradually shifting traffic from the old version to the new one through defined steps.
  • Blue-Green Deployments: Redirecting traffic between two environments with minimal downtime.
  • Experiments: Deploying multiple versions simultaneously to compare performance using key metrics.
  • Automated Rollbacks: Reverting to a stable version when health checks fail.

Argo Rollouts enhances Argo CD’s functionality and is fully compatible with GitOps practices. It also supports various types of analysis to enable progressive delivery. In this example, we use Datadog metrics to drive rollout decisions.

By leveraging real-time metrics as gating conditions, rollouts proceed only when performance remains within healthy thresholds. This approach avoids blind deployments and enables smarter, data-informed decisions:

  • Rollouts pause if latency increases or error rates spike.
  • Rollbacks trigger automatically when service-level indicators (SLIs) degrade.
  • Real-time analysis provides confidence in each stage of the rollout.

In contrast, traditional deployments often push changes to production without sufficient safeguards. With Datadog integrated into Argo Rollouts:

  • Regressions are detected earlier in the release cycle.
  • Automated rollbacks occur before faulty versions reach full traffic.
  • Production outages and customer-facing errors are minimized.

Datadog for Metrics-Based Analysis

Datadog offers comprehensive observability across infrastructure, applications, and services. It supports custom metric ingestion, Kubernetes monitoring, Application Performance Monitoring (APM), and alerting.

When integrated with Argo Rollouts, Datadog metrics can act as automated gates, helping determine whether a rollout continues, pauses, or rolls back based on real-time performance indicators such as:

  • Latency (e.g., http.request.latency)
  • Error rate (e.g., http.error.rate)
  • CPU and memory utilization
  • Custom business KPIs (e.g., payment.success.count)

Datadog provides a unified view across:

  • Infrastructure metrics like CPU, memory, and disk I/O
  • Application telemetry including APM, logs, and traces
  • Business KPIs such as conversion rates and transaction counts

This holistic observability allows teams to base deployment decisions not only on system health, but also on user experience and business impact.

By tying deployments to real-time business indicators, rollouts can adapt dynamically. For example, if a new feature negatively affects payment success rates or slows down checkouts, the rollout is automatically halted. This gives SREs and business stakeholders confidence that changes won’t compromise the user experience.

Argo Rollouts Architecture

Prerequisites

To get started, make sure the following components are installed within your CaaS environment:

  • Argo Rollouts
  • Datadog Agent
  • Ingress Controller

Benefits of the Integration

  • Automated Decision Making: Deployment progression or rollback is driven by real-time metrics.
  • Reduced Risk: Early detection of anomalies halts faulty rollouts before reaching full production.
  • Business KPI-Driven Rollouts: Tie deployment decisions to user-facing metrics, not just infrastructure stats.
  • Improved Observability: Datadog dashboards and alerts offer full visibility into rollout performance.

Best Practices

  • Set conservative thresholds at the start, especially for latency and error rates.
  • Define rollback hooks to minimize service disruption during failure scenarios.
  • Use pause durations to allow metric collection and stabilization between rollout steps.
  • Consider A/B testing via the Experiment strategy to compare multiple versions in parallel.

Templates to Be Created 

  • Create one or more analysis templates based on health check requirements—these may focus on infrastructure health, application health, or both.
  • Your rollout templates should include an analysis step referencing the appropriate analysis template.
  • Based on the success conditions of Datadog metrics, the rollout will either automatically promote or roll back.

For more details on configuring Datadog-based analysis templates, refer to the official documentation.

Sample Templates

YAML
 
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: loq-error-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: error-rate
    interval: 5m
    successCondition: result <= 0.01
    failureLimit: 3
    provider:
      datadog:
        apiVersion: v2
        interval: 5m
        query: |
          sum:requests.error.rate{service:{{args.service-name}}}
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: datadog-cpu-check
spec:
  metrics:
  - name: cpu-check
    interval: 1m
    successCondition: result > 0.5
    provider:
      datadog:
        query: "avg:system.cpu.user{*}"
YAML
 
# This example demonstrates a Rollout which performs background analysis while the Rollout is updating.
#
# Prerequisites:
# * kubectl apply -f analysis-templates.yaml
#
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-background-analysis
spec:
  replicas: 4
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-background-analysis
  template:
    metadata:
      labels:
        app: rollout-background-analysis
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
  strategy:
    canary:
      # An AnalysisTemplate is referenced here, which starts an AnalysisRun as soon as the update
      # begins. The run is terminated when the update completes. A failure/error of the analysis
      # will cause the rollout's update to abort, and set the canary weight to zero.
      steps:
      - setWeight: 20
      - pause: {duration: 5m}
      - analysis:
          templates:
          - templateName: loq-error-rate
          args:
          - name: service-name
            value: analysis-service


Rollout background analysis in Argo


Suggested Service Architecture 

Note: The technology is not yet finalized and this diagram is only for reference of request flow

Suggested Service Architecture diagram


Challenges and Considerations

  • Metrics Lag: Real-time metrics may have slight delays. Use interval and duration fields appropriately.
  • Cost Overhead: Frequent metric queries in Datadog can incur cost; be efficient with scope and frequency.
  • Debugging: Failed analysis runs can be hard to trace—use verbose logging and alerts to aid troubleshooting.

Conclusion

The integration of Argo Rollouts with Datadog metrics marks a significant step forward in modern deployment practices. By combining Kubernetes-native progressive delivery with real-time observability, development and operations teams can deploy with greater confidence and precision.

This synergy enables automated, intelligent decision-making, promoting or rolling back releases based on live system health and performance data. It minimizes the risk of faulty deployments reaching production and reduces the need for manual oversight during critical rollout phases.

As reliability, agility, and automation become core to modern DevOps pipelines, tools like Argo Rollouts and Datadog are becoming essential. Together, they offer a scalable, metrics-driven approach to continuous delivery, making deployments safer, faster, and more resilient.

Kubernetes Continuous Integration/Deployment DevOps

Opinions expressed by DZone contributors are their own.

Related

  • CI/CD With Azure DevOps and Alibaba Cloud Kubernetes (ACK)
  • Using Azure DevOps Pipeline With Gopaddle for No-Code Kubernetes Deployments
  • Docker Hub Rate Limits to Limitless DevOps in the Cloud
  • DevOps Fast Forward with Go

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: