Using Continuous Integration to Detect Performance Degradation
Using Continuous Integration to Detect Performance Degradation
While APM tools are one way to monitor an application's performance, what about using CI tools to watch for similar metrics? Let's take a look at how you might use a CI tool to find degradations in performance.
Join the DZone community and get the full member experience.Join For Free
Container Monitoring and Management eBook: Read about the new realities of containerization.
Typically when performance testing in a Continuous Integration (CI) scheme, we try to simulate the expected load and perform the test as similar to the production environment that we can. I would like to suggest another complementing CI/CD approach. Through this approach, we can immediately detect performance degradations and get quick feedback about degradation in the performance of our system. This can be achieved by performing daily executions of small test cases with relatively small loads, together with strong assertions.
The goal is to be alerted of the impact of any newly committed code on the performance of our system. If that impact is negative, we can immediately evaluate the change and solve the problem on the spot, before promoting these changes. Then, we can arrive at load testing with higher loads in a better, better prepared and relaxed manner.
As in any Continuous Integration flow, our daily, smaller performance tests should be executed in the pipeline of our continuous integration tool. This can be done in every build that we do, or if this takes too much time, we can set a timebox to run the tests when we want. The scripts should be executed with accurate assertions in order to generate alerts if the performance goes down.
It's important to set your assertions according to your business decisions, which determine what is considered acceptable performance and what isn't. In the picture below we can see the requests per second (blue) alongside the expected RPS result (green) over time. It is possible to see that the performance goes down but the test doesn't fail, and that happens because the assertion is not tight enough. In other words, the assertion was not able to detect the degradation in the request that we are testing.
To ensure the success of this approach, we should keep in mind the following considerations:
- Environment exclusiveness in order to ensure our tests repetitiveness. We need an exclusive environment in order to ensure that the results that we get are effectively caused by our test and not caused by another system that could be running in the same infrastructure. If we can, we should reset our system as many times as we need and with that, ensure the repetitiveness of our test.
- Set up different scenarios and loads in order to find out the breakpoint of the service that you are testing in your testing environment.
- Measure and log errors rate, responses time, and the number of requests per second that our server could attend during the test. If your test doesn't have any assertion, your build will never report any problem.
- Set up the test with a load 5 or 10 percent below the system breakpoint load, so that a small degradation will be detected more easily. In our experience, the following numbers work for setting the assertions in a pipeline with the strategy we are describing:
- Error rate < 1% (how many users do you accept to receive an error message?)
- Percentile 95 of responses time < response time when the system breaks + 10%. Percentile 95 means that you don't care if the 5% of the users get a worse response time. Is 5% ok for you?
- Requests per second > request per second when the system breaks - 10%.
In the last two points (4.b and 4.c) we are considering a margin of 10%. You can adjust this number to the one that makes sense for you. The important thing to consider here is that even though you run the same test twice under the same conditions, the results are not going to be identical. So, if your margin is too small, your test will fail so frequently, even when there is no degradation (false positives). If your margin is too large, then you will miss the opportunity to detect degradations (false negatives). Something between 10 to 20 percent is a decent margin.
- In this approach, it is convenient if the test environment is smaller than the production one because it will be easier to find the breaking point of the system.
- We will need several tests to find the breakpoint of the system. On each execution, we have to watch how the requests per second (RPS) increase in the same order of the number of users. If you see that the RPS doesn't change or even more, decrease, that's means that something is saturated, the system is not scaling anymore, so, we found the breakpoint. We can see this in the following graph:
After setting up the assertions and the scenario, we have to review the test in the next days in order to double check if it is still stable or if there is something more to adjust. Continuous Integration means that we will be checking the test results constantly, but what we meant here is that we have to review our assertions (test our tests).
The benefits of this approach are the same that we know of CI/CD:
- We get feedback in time
- We gain confidence in our build
- We are going to learn more about the aspects that make our system fail, in time, in order to avoid that the same error happens in other areas of our application.
The first try might seem a little difficult to set up and maintain because we have to review the status of each build, if the fails are really fails, if there are issues with the test environment, etc. That's why it's important to involve the developer team in these tasks from the beginning, and, why not, include this in our Definition of done.
Something good is that we have the log of changes for each build, so, if something negatively affects our test results, we can easily detect and review the last changes made.
In addition, in our CI tool, Jenkins for example, we will keep a history of the trend of different performance metrics for each service, how many times they failed, when and why.
Finally, but just as important, I want to talk about Monitoring. If we have the possibility to use an APM tool, we can also keep a history of the server's resources in order to correlate error times with the server's resources at that time. This way we have more information to solve our issues with.
Learn more about Continuous Integration from this free course.
BlazeMeter integration with Continuous Integration tools like Jenkins, enables you to add your BlazeMeter performance test to your build. With BlazeMeter you can scale your open source testing script to as many users as you need and have it run from all over the world, and run it daily, after every build or when you choose. Then, you can analyze your assertions with insightful graphs, both real-time and looking back. Finally, you can share your tests and results with team members and managers.
We have seen how this continuous integration complements the traditional performance testing that we know and the advantages of it. We can detect the exact moment that any change makes the system work 10% worse. We don't need a big environment, it's a cheap solution, repeatable, and easy to maintain.
But, as I have said, this approach does not replace large-scale performance testing, it just prepares us better for that test. I hope this post shows you a new way to use performance testing, working on a CI/CD scheme!
Join the Continuous Testing conversation on Slack.
To try out BlazeMeter, put your URL below and your test will start in minutes.
Published at DZone with permission of Federico Toledo , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.