Performance Management in Continuous Integration
I recently gave presentations on Performance Management as part of Continuous Integration at QCon London and JAX in Wiesbaden. While I got the feedback that this definitely makes sense, a lot of people said they do not know how to put it into practice. Therefore I will provide a short implementation guide on how to integrate performance management into development - or more specifically Continuous Integration.
The Basic Idea
Before going into the details I will start explaining the basic idea behind the approach. Let us start with how a software system works from the performance perspective. The model we are using here are queuing networks. A queuing network is a collection of resources (nodes) with queues for processes waiting to get a hold on those resources. The resources are connected to each other forming a network . The image below shows a simplified queuing network for a single-tier web application
We see all our major resources modeled which are CPU, Memory, Disk and other limited resources like threads in thread pools, database connections or the network. How we use these resources affects performance and scalability. If we - for example - execute too many database statements, we will run into performance problem when load increases, as more requests have to wait for a database connection to become available. Tuning approaches like wait-based-tuning rely on these increasing wait times to identify performance problems.
If we run short of resources, we try to increase the amount of resources i.e. adding more CPUs or memory. This is referred to as vertical scaling if it is done within one system or horizontal scaling if is done by adding more nodes. See my post on Performance vs. Scalability for more information and links on this topic. From a scalability perspective CPU and Memory are scalable by hardware. Network and sequential shared resource access can only be scaled by changing the application behavior and architecture. Architectural changes often involve a lot of work and cost time and money. So our primary goal must be to discover architectural problems as early as possible. Based on my own experience and also other people I have talked to about 50 percent of later performance problems can already be found in development.
In addition to architectural problems the detection of performance regressions is our second major concern. Especially in large applications which are continuously improved - so all serious enterprise applications - changes get introduced by implementing a new feature which affect the performance of other features. This is then very often discovered late during final load tests. This is however not a developer fault. Very often a developer does not understand all interdependencies of a complex application. Respectively side effects are not obvious. Performance regression analysis should discover exactly these regression problems. This can happen as easy as introducing an additional field in a data transfer object, eventually resulting in additional megabytes being transferred over the wire in certain usage scenarios.
Bottom line Continuous Integration Performance Management avoids introducing architectural problems and performance regression. This process also must be automated as it has to be part of the Continuous Integration process.
Implementing Performance Management in Continuous Integration
Now as we are clear what we want to achieve we look at how to implement performance testing and management and integrate it into our existing processes.
Test Case Design
The first step in implementing any form of automated testing is writing the proper test cases. For writing performance tests we have a number of options. We can write tests for specific performance characteristics of our application like processing 3 GB of data or processing 300 login requests. Very often functional integration tests can also be reused. They test certain application use cases and all major application components are involved which provides a solid basis for analyzing the dynamic behavior of our application. Small functional unit tests are not well suited for performance testing. Test cases should also have an execution time greater than one second at least.
We defined Architecture Validation as the process to detect architectural flaws in the runtime behavior of our application. From my experience actual problems are very similar although the actual applications are different. The reasons are obvious. As discussed above the the problem areas regarding scalability and performance are shared resource usage, database interaction and network usage. The following list provides the main criteria for which to check during architecture validation:
- Number and efficiency of database queries
- Number of remote calls and transferred data volume
- Memory consumption of a specific transaction
- Usage efficiency of shared resources
These are the characteristics which should be checked when reviewing the dynamic behavior of an application architecture. While this should already be part of a code review, we also will integrate this as part of architecture validation in our Continuous Integration environment.
How can we do that? First of all, we need to capture a transactional trace of each executed test case. dynaTrace’s PurePath technology provides exactly this functionality including capturing of all relevant context information like database calls, memory allocations and remote communication details. If you do not use dynaTrace you can either buy it or try to implement your the functionality to capture this data on your own. In the next step we define rules - or assertions - which have to hold for our test cases. If they are not satisfied, we define a test case as having failed from a performance perspective. Examples of such rules are:
- For each web page all data has to be fetched within one database statement.
- The number of remote calls must not exceed a certain threshold.
- All calls in a SOA application have to be performed asynchronously
- A transaction must not execute the same database statement twice
These are just a number of examples what you might want to check for. Which rules you will define depends on your application and also the history of problems you ran into in the past. Below is an example result for an architecture validation report form a Continuous Integration run. We see a number of architectural issues coming up and also the respective test case traces which are used to analyze the problem in detail.
The key point in Architecture Validation is not sticking to response times but dynamic application behavior. As I explained in an earlier post the dynamic behavior of the application unveils potential performance problems already before they manifest in bad response times. Very often people argue that performance management in development makes not sense as the timing values are not representative for production environments. Timing values are not easily usable - this is correct. However problems are already visible. For sure we do not find all problems using this approach. There are problems which will only manifest under heavy load. My experience in real world consulting and fire fighting engagement however showed me that a significant amount of problems can be found without load testing.
The second part of Continuous Integration Performance Management is regression analysis. Regression analysis points out performance degredations caused by changing parts of the application code. Everybody of us hates regression problems. For functional regressions we are using unit- and integration test with assertions on application behavior. In performance analysis we also use test cases which we can execute automatically. Depending on the type of application this can be Selenium or WebTest - or other - test cases for browser-based applications, functional testing tools or custom test drivers. This is a matter of taste. The only criteria that have to be fulfilled are that the tests cover real world use cases and that they have a certain execution time to make some statements about runtime characteristics. The good thing is that very often already existing functional test cases can be reused. Alternatively you can also use a small scale load test for regression testing.
In order have the basis for regression analysis we need to collect performance diagnosis data for each automated test run - or at regular intervals like at the end of each development iteration. These test runs are then automatically compared analyzing differences in the execution behavior of test cases. The figure below show a sample regression report. Indicating a performance decrease in the database and Web Service layer of a specific test case.
This automated detection of regressions is really cool, as you no longer have to worry about them. If everything is green - no problem. If some parts are red, you might have a problem. This is just as handy as unit tests for functional testing. Some might now argue that it is normal that code changes over time. This is correct and not all changes are real problems. How can you find out? Take the report to your daily stand up meeting, present the issues and then discuss with the responsible developer. Additionally you will also send them as part of your Continuous Integration Reports.
How to work with execution time
Wasn’t there an issue with execution times? Yes, there was. So as we need execution time metrics for regression analysis we have invest some work to make them more meaningful. The first thing is that we need dedicated hardware to run oru performance tests on. As these are only small test cases, a standard desktop machine will do fine. For large scenarios you might consider switching to bigger machines. The next problem is that often Garbage Collection makes timing information unusable. Therefore you have to subtract garbage collection times from the test execution. This is not so easy however dynaTrace supports this feature out-of-the-box.
Finally you might get highly varying test if your test case uses disc or network access. Exclusive access to disc and network should be a no brainer for performance testing. However even then test results still vary. Therefore I take the approach to subtract network access and disc access times from the test execution time. In order to still discover regression problems you can switch to alternative measurements like number of file access operations or remote calls or data volume read/transferred. If have the possibility you can also run tests only locally and load necessary file content in the setup phase of your test.
Following this approach will help to get reasonable timing results. However a variance of five to ten percent should still be considered acceptable. I also recommend to use a baseline test which does some simple calculation to see how stable your timing values are.
Integration into build environments
Now as we know what to do, we have look at how we do it. The figure below show a sample process flow for Continuous Integration Performance Testing. First the developer implements the new feature. Then he ideally already runs an architecture validation test locally. Then he checks in the code to the CI environment. This triggers an automated build and then an automated run of all unit tests. We run them first as it makes no sense to run longer lasting performance tests if the functionality is broken. Then we run functional integration and performance tests either together - by integrated collection of performance diagnosis data - or seperately.
The developer as well as the testing team and development management - actually whoever is or should be interested - receives a report of discoved architectural problems and identified performance regression. This easy integration can be achieved by automatically starting the data collection using Ant, Maven, NAnt or MSBuild. Being part of the build script performance testing can be easily integrated into almost any build system. dynaTrace for example provides a REST/SOAP interface to control recording and also test result generation.
Performance Testing in Continuous Integration helps to identify a lot of potential performance problems. Architecture Validation ensures that the dynamic behaviour of the application will not result in performance and scalability problems. Regression analysis ensures that hidden performance regressions are unveiled automatically as part of the CI process. While some investment has to be made, assests like test cases can often be reused. From real world implementations I have done the feedback was that development teams were able to discover more bugs at an early development stage. Additionally developers get more insight in the performance characteristcs of their implementations - especially when it comes to bug fixes. Architects as well as developers develop a better “feeling” on the dynamic behavior of their applications, as they see the effects of changes immediatley. Load tests and roll-out also gets more efficient as a lot of bugs have already been resolved.
So if you have a well running CI system and are worried/interested about the performance of your application, you should seriously consider making performance management part of your development process.