Understanding Canary Analysis
Canary analysis is a two step process where we assess the canary based on metrics and logs to infer if we are going to promote or roll back the new version
Join the DZone community and get the full member experience.Join For Free
We discussed the basics of canary deployment In the blog “What is Canary Deployment?“. It is one of the most widely used strategies because it lowers the risk of moving updates into production, while simultaneously reducing the need for additional infrastructure.
Today, many applications run in a dynamic environment of microservices, which makes the software integration and delivery process complex. The only way to successfully deploy updates consistently is to use solid automation with tools like Spinnaker.
What Is Canary Analysis?
Canary analysis is a two-step process where we will assess the canary based on a selected list of metrics and logs to infer if we are going to promote or roll back the new version. So we need to be sure that we are gathering the right information (metrics and logs) during the test process and conducting a solid analysis. (From here forward in this blog we will refer to metric and logs simply as metrics)
The Two Steps To Perform Canary Analysis
- Metric selection and evaluation :
This step involves selecting the right metrics for monitoring the application and canary health. We need to be sure to create a balanced set of metrics to assess the canary. Lastly if necessary the metrics need to be grouped based on correlation with each other. Luckily, we perform this step just once for each pipeline.
- Selection and evaluation of metrics :
The monitoring systems in microservice-based applications typically generate mountains of data, so we can’t simply analyze all metrics. Selecting all of the metrics simply isn’t required. Once we have selected the high-impact metrics, adding any more metrics will just increase our effort. But which ones to choose? Read on!
Selection and Evaluation Criteria
- Most Important Business Metrics
The most important metrics are the typical ones that are most important to the purpose of the application. For example, in an online shopping checkout application, be sure to measure the number of transactions per minute, the failure rate, and so on. If any of these metrics are outside the baseline, you will probably decide to roll back the canary. Most of these metrics are probably already monitored in staging – be sure to also check them in the canary process as well.
- Balanced Set of Fast and Slow Metrics
A single metric is not sufficient for a meaningful canary process. Some of the important metrics to monitor are generated instantaneously and some take time to generate as they have a dependency on load, network traffic, high memory usage, or other factors. It is important to balance the metrics you choose between fast and slow metrics. For example, server query time and latency checks may be examples of slow and fast metrics respectively. Choose a balanced set of metrics that will provide you with a strong overall view of the health of the canary.
- Metric Smoke Test
From the balanced pool of metrics, we have to make sure that we have metrics that directly indicate a problem in the canary. Though our goal is finding the health of a canary, it is equally important to finding if there is an underlying problem in the canary with such metrics. For example, a response of 404 or another unexpected HTTP return code (200s, 300s) might mean that your test should be stopped instantly and be debugged. CPU usage, memory footprint, HTTP return codes (200s, 300s, etc.), the latency of response, correctness, are a good set of metrics but the HTTP return codes and latency of response indicate an actual problem that impacts users and services. This is sometimes referred to as a metric smoke test. For the smoke test, metrics for point-in-time usage (for example, point-in-time CPU usage or network bandwidth) aren’t as useful as a change in those metrics, and they could just add noise to our analysis.
- Metric Standard Ranging
These metric selections need to have an acceptable range of play and we should not be too strict or too lenient. With an agreed acceptable behavior of a metric, we will be able to eliminate bad canaries that were incorrectly assessed as good ones. Choosing these ranges is a difficult task that is usually completed incrementally over time. When in doubt, err on the side of being conservative – in other words, choose a small range of acceptable values. We would not want a canary to be deployed in production that was assessed correctly but was actually problematic.
- Metric Correlation
With a balanced set of metrics, we now have our basic tool kit ready for canary analysis. Most of these metrics will be interrelated and finding the correlation and then grouping them is very crucial. We will need to group the metrics in such a way that these groups are isolated. For example, a dramatic increase in CPU usage of the system as a whole would make for a poor metric because other processes such as DB and batch queries might be causing that increase. A better metric would be CPU time spent per process served.
Canary Hypothesis Testing
We need to perform the last step of canary analysis evaluating the values of the metrics. By doing this we will be able to assess if the canary instance should be promoted to production or not.
Before every hypothesis, we will define the two hypotheses, and based on our data we will prove one wrong.
H0: The canary is bad: Rollback
H1: The canary is good: Roll forward
A simple analysis might start by comparing the canary results with the larger baseline version. However, this evaluation typically leads to incorrect decisions.
A Canary at 10% of the traffic may not behave abnormally but there is no way to be confident that it will behave the same at 30%(or 60% or 90%) of the traffic. This may be because of the following reasons.
- A canary is just released and is only minutes into the production, whereas the metrics from the baseline version are probably weeks or months old. Comparing them will be misleading.
- We have channeled a small section of the traffic to the Canary. The pressure on the systems is probably out of proportion.
This issue is resolved with the use of the A|B Testing strategy. Instead of comparing the canary with a baseline from production, we split the canary infrastructure into two pieces, and deploy the baseline system into one half and the new canary the other half. This will allow for a level playing field where we can accurately gauge if the new canary compares favorably to the baseline system. Because the instance is small, the cost of a separate A|B Test inside a canary deployment will be minimal.
As software deployments move at scale and velocity it becomes highly crucial to have a proper system in place that assesses these releases before they damage the production environment leading to losses for an organization and irate customers. canary deployments provide an excellent approach to reducing the risk of introducing a defect into production, are relatively low cost, and do not slow the process much as long as appropriate automation is involved.
Published at DZone with permission of Jyoti Sahoo. See the original article here.
Opinions expressed by DZone contributors are their own.