Advance Traffic Management in Canary Using Istio, Argo Rollouts, and HPA

Learn how to set up Istio, Argo Rollouts, and HPA to attain advanced traffic management and autonomous scaling of pods in canary deployment.

Debasree Panda

Aug. 14, 24 · Tutorial

Likes (3)

Comment

Save

13.0K Views

As enterprises mature in their CI/CD journey, they tend to ship code faster, safely, and securely. One essential strategy the DevOps team applies is releasing code progressively to production, also known as canary deployment. Canary deployment is a bulletproof mechanism that safely releases application changes and provides flexibility for business experiments. It can be implemented using open-source software like Argo Rollouts and Flagger. However, advanced DevOps teams want to gain granular control over their traffic and pod scaling while performing canary deployment to reduce overall costs. Many enterprises achieve advanced traffic management of canary deployment at scale using open-source Istio service mesh. We want to share our knowledge with the DevOps community through this blog.

Before we get started, let us discuss the canary architecture implemented by Argo Rollouts and Istio.

Recap of Canary Implementation Architecture With Argo Rollouts and Istio

If you use Istio service mesh, all of your meshed workloads will have an Envoy proxy sidecar attached to the application container in the pod. You can have an API or Istio ingress gateway to receive incoming traffic from outside. In such a case, you can use Argo Rollouts to handle canary deployment. Argo Rollouts provides a CRD called Rollout to implement the canary deployment, which is similar to a Deployment object and responsible for creating, scaling, and deleting ReplicaSets in K8s.

The canary deployment strategy starts by redirecting a small amount of traffic (5%) to the newly deployed app. Based on specific criteria, such as optimized resource utilization of new canary pods, you can gradually increase the traffic to 100%. The Istio sidecar handles the traffic for the baseline and canary as per the rules defined in the Virtual Service resource. Since Argo Rollouts provides native integration with Istio, it would override the Virtual Service resource to increase the traffic to the canary pods.

Canary can be implemented using two methods: deploying new changes as a service or deploying new changes as a subset.

1. Deploying New Changes as a Service

In this method, we can create a new service (called canary) and split the traffic from the Istio ingress gateway between the stable and canary services. Refer to the image below.

You can refer to the YAML file for a sample implementation of deploying a canary with multiple services here. We have created two services called rollouts-demo-stable and rollouts-demo-canary. Each service will listen to HTTP traffic for the Argo Rollout resource called rollouts-demo. In the rollouts-demo YAML, we have specified the Istio virtual service resource and the logic to gradually improve the traffic weightage from 20% to 40%, 60%, 80%, and eventually 100%.

2. Deploying New Changes as a Subset

In this method, you can have one service but create a new Deployment subset (canary version) pointing to the same service. Traffic can be split between the stable and canary deployment sets using Istio Virtual service and Destination rule resources.

Please note that we have thoroughly discussed the second method in this blog.

Implementing Canary Using Istio and Argo Rollouts Without Changing Deployment Resource

Since there is a misunderstanding among DevOps professionals that Argo Rollouts is a replacement for Deployment resource, and the services considered for canary deployment have to refer to the Argo Rollouts with Deployment configuration rewritten.

Well, that’s not true.

The Argo Rollout resource provides a section called workloadRef where existing Deployments can be referred to without making significant changes to Deployment or service YAML.

If you use the Deployments resource for a service in Kubernetes, you can provide a reference in the Rollout CRD, after which Argo Rollouts will manage the ReplicaSet for that service. Refer to the image below.

We will use the same concept to deploy a canary version using the second method: deploying new changes using a Deployment.

Argo Rollouts Configuration for Deploying New Changes Using a Subset

Let's say you have a Kubernetes service called rollout-demo-svc and a deployment resource called rollouts-demo-deployment (code below). You need to follow the three steps to configure the canary deployment.

Code for Service.yaml:

    YAML
   
 

   apiVersion: v1
kind: Service
metadata:
  name: rollouts-demo-svc
  namespace: istio-argo-rollouts
spec:
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: rollouts-demo
  

Code for deployment.yaml:

    YAML
   
 

   apiVersion: apps/v1
kind: Deployment
metadata:
  name: rollouts-demo-deployment
  namespace: istio-argo-rollouts
spec:
  replicas: 0 # this has to be made 0 once Argo rollout is active and functional.
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
        resources:
            requests:
              memory: 32Mi
              cpu: 5m
  

Step 1: Setup Virtual Service and Destination Rule in Istio

Set up the virtual service by specifying the back-end destination for the HTTP traffic from the Istio gateway. In our virtual service rollouts-demo-vs2, we mentioned the back-end service as rollouts-demo-svc, but we created two subsets (stable and canary) for the respective deployment sets. We have set the traffic weightage rule so that 100% of the traffic goes to the stable version and 0% goes to the canary version.

As Istio is responsible for the traffic split, we will see how Argo updates this Virtual service resource with the new traffic configuration specified in the canary specification.

    YAML
   
 

   apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: rollouts-demo-vs2
  namespace: istio-argo-rollouts
spec:
  gateways:
  - istio-system/rollouts-demo-gateway
  hosts:
  - "*"
  http:
  - name: route-one
    route:
    - destination:
        host: rollouts-demo-svc
        port:
          number: 80
        subset: stable
      weight: 100
    - destination:
        host: rollouts-demo-svc
        port:
          number: 80
        subset: canary
      weight: 0
  

Now, we have to define the subsets in the Destination rules. In the rollout-destrule below, we have defined the subsets canary and stable and referred to the Argo Rollout resource called rollouts-demo.

    YAML
   
 

   apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: rollout-destrule
  namespace: istio-argo-rollouts
spec:
  host: rollouts-demo-svc
  subsets:
  - name: canary   # referenced in canary.trafficRouting.istio.destinationRule.canarySubsetName
    labels:        # labels will be injected with canary rollouts-pod-template-hash value
      app: rollouts-demo
  - name: stable   # referenced in canary.trafficRouting.istio.destinationRule.stableSubsetName
    labels:        # labels will be injected with stable rollouts-pod-template-hash value
      app: rollouts-demo
  

In the next step, we will set up the Argo Rollout resource.

Step 2: Setup Argo Rollout Resource

The rollout spec should note two important items in the canary strategy: declare the Istio virtual service and destination rule and provide the traffic increment strategy.

You can learn more about the Argo Rollout spec.

In our Argo rollout resource, rollouts-demo, we have provided the deployment (rollouts-demo-deployment) in the workloadRef spec. In the canary spec, we have referred to the virtual resource (rollouts-demo-vs2) and destination rule (rollout-destrule) created in the earlier step.

We have also specified the traffic rules to redirect 20% of the traffic to the canary pods and then pause for manual direction.

We have given this manual pause so that in the production environment, the Ops team can verify whether all the vital metrics and KPIs, such as CPU, memory, latency, and the throughput of the canary pods, are in an acceptable range.

Once we manually promote the release, the canary pod traffic will increase to 40%. We will wait 10 seconds before increasing the traffic to 60%. The process will continue until the traffic to the canary pods increases to 100% and the stable pods are deleted.

    YAML
   
 

   apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
  namespace: istio-argo-rollouts
spec:
  replicas: 5
  strategy:
    canary:
      trafficRouting:
        istio:
          virtualService:
            name: rollouts-demo-vs2        # required
            routes:
            - route-one                 # optional if there is a single route in VirtualService, required otherwise
          destinationRule:
            name: rollout-destrule    # required
            canarySubsetName: canary  # required
            stableSubsetName: stable  # required
      steps:
      - setWeight: 20
      - pause: {}
      - setWeight: 40
      - pause: {duration: 10}
      - setWeight: 60
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  workloadRef: 
    apiVersion: apps/v1
    kind: Deployment
    name: rollouts-demo-deployment
  

Once you have deployed all the resources in steps 1 and 2 and accessed them through the Istio ingress IP from the browser, you will see an output like the one below.

You can run the command below to understand how the pods are handled by Argo Rollouts.

    YAML
   
   kubectl get pods -n <<namespace>>

Validating Canary Deployment

Let’s say developers have made new changes and created a new image that is supposed to be tested. For our case, we will make the Deployment manifest file (rollouts-demo-deployment) by modifying the image value from blue to red (refer to the image below).

    YAML
   
   spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue

Once you deploy the rollouts-demo-deployment, Argo Rollout will understand that new changes have been introduced to the environment. It would then start making new canary pods and allow 20% of the traffic. Refer to the image below:

Now, if you analyze the virtual service spec by running the following command, you will realize Argo has updated the traffic percentage to canary from 0% to 20% (as per the Rollouts spec).

    YAML
   
   kubectl get vs rollouts-demo-vs2 -n <<namespace>> -o yaml

Gradually, 100% of the traffic will be shifted to the new version, and older/stable pods will be terminated.

In advanced cases, the DevOps team must control the scaling of canary pods. The idea is not to create all the pods as per the replica at each gradual shifting of the canary but to create the number of pods based on specific criteria. In those cases, we need HorizontalPodAutoscaler (HPA) to handle the scaling of canary pods.

Scaling of Pods During Canary Deployment Using HPA

Kubernetes HPA is used to increase or decrease pods based on load. HPA can also be used to control the scaling of pods during canary deployment. HorizontalPosAutoscaler overrides the Rollouts behavior for scaling of pods.

We have created and deployed the following HPA resource: hpa-rollout-example.

Note: The HPA will create the number of pods = maximum (minimum pods as per HPA resource, or number of the replicas mentioned in the Rollouts).

This means if the number of pods mentioned in the HPA resource is 2 but the replicas as per the Rollouts resource is 5, then a total of 5 pods will be created.

Similarly, if we update the replicas in the rollouts-demo resource as 1, then the number of pods created by HPA will be 2. (We will have updated the replicas to 1 to test this scenario.)

In the HPA resource, we have referenced the Argo Rollout resource rollouts-demo. That means HPA will be responsible for creating two replicas at the start. If the CPU utilization is more than 10%, more pods will be created. A maximum of six replicas will be created.

    YAML
   
 

   apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-rollout-example
  namespace: istio-argo-rollouts
spec:
  maxReplicas: 6
  minReplicas: 2
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    name: rollouts-demo
  targetCPUUtilizationPercentage: 10
  

When we deployed a canary, only two replicas were created at first (instead of the five mentioned in the Rollouts).

Validating Scaling of Pods by HPA by Increasing Synthetic Loads

We can run the following command to increase the loads to a certain pod.

    YAML
   
   kubectl run -i -tty load-generator-1 -rm -image=busybox:1.28 -restart=Never - /bin/sh -c "while sleep 0.01; do wget -q -O- http://<<service name>>.<<namespace>>; done;"

You use the following command to observe the CPU utilization of the pods created by HPA.

    YAML
   
   kubectl get hpa hpa-rollout-example -n <<namespace>> -watch

Once the load increases more than 10%, in our case to 14% (refer to the image below), new pods will be created.

Many metrics, such as latency or throughput, can be used by HPA as criteria for scaling up or down the pods.

Video

Below is the video by Ravi Verma, CTO of IMESH, giving a walkthrough on advanced traffic management in Canary for enterprises at scale using Istio and Argo Rollouts.

Final Thought

As the pace of releasing software increases with the maturity of the CI/CD process, new complications will emerge. And so will new requirements by the DevOps team to tackle these challenges. Similarly, when the DevOps team adopts the canary deployment strategy, new scale, and traffic management challenges emerge to gain granular control over the rapid release process and infrastructure cost.

DevOps Kubernetes YAML pods Continuous Integration/Deployment

Published at DZone with permission of Debasree Panda. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending