Performance or load tests can be challenging to analyze since they provide large amounts of data. This series of blog posts will share methodical best practices that can assist performance engineers in their work. These best practices are based on my 17 years of experience in the performance engineering industry.
Last time, we covered the difference between performance engineering and performance reporting, why there is no replacement to human performance engineers, and three best practices: identify tier-based engineering transactions, monitor KPIs cleverly, and reduce the number of transactions you analyze.
This time, we will go over some more best practices.
1. Don’t Jump the Gun: Wait for the Test to Complete Before Analyzing
It’s quite funny to watch business stakeholders during a load test. It’s like they are watching a sports game. Performance engineers start out as the quarterbacks, initiating the test or the play, but they soon become the referee breaking up fights. It's actually quite comical and predictable.
It usually goes down something like this: The stakeholders are concentrating on that orange response time line, the test starts ramping up slowly, methodically...and they exclaim, Woah, look at those lightning fast response times! I told you we had over capacity planned; we didn’t even need to pay for all this hardware. Such a waste.
Then, response times start to deviate and they start getting nervous. Stakeholders now start speculating as to the root cause of bottlenecks without any real evidence other than an orange line. They start pointing fingers at groups who are responsible for certain tiers of the deployment. You can tell them those values are in fact milliseconds and they quiet down, but only for a bit.
Then, response times start to exceed three seconds and they get worried again, but try not to start blaming their peers because they just learned a valuable lesson. So now, they are just giving loud sighs and look like they are praying.
Then, response times spike, and they jump up, insisting that the app has now crashed and someone needs to be fired and furiously demanding the reason why are they paying for an elastic cloud deployment that was supposed to solve all their scalability limitations. Ah, yes, that magical cloud.
Now, the performance engineer is part psychologist and less the technologist, trying to calmly explain the value they are receiving by running performance tests prior to going live.
I will drive this point here: There's no need to analyze during a test and drive your mind crazy watching every new datapoint appear while trying to anticipate the trend. Watching a test in flight is intended only to verify that it's executing as planned. It's not the time or the place to analyze!
Design a methodical load test to answer a specific engineering question, kick it off and make sure it's behaving as expected — then go get lunch, go for a walk, go socialize — let the load and monitoring tool do its automated job.
Trust me on this: The results and the trends will be far more clear and easier to interpret after the test has completed. Nothing will change by you observing each data point as it arrives. So relax.
2. Reproducible Results: The Magic Number Is 3
For every test scenario, run that same load test three times until completion. For these three test executions, you do not tweak or change anything within your performance test harness; none of the run time settings, none of the code in the load scripts, not the duration of the test nor the ramp schedule, and definitely absolutely nothing gets changed in the target web application environment. Only data resets or server recycles, and only if that is what is required to bring the environment back to a baseline between test runs.
I promise you this: Using the “Magic of 3” will make you extremely efficient by saving you a ton of wasted hours chasing red herrings. It will reduce the data you need to analyze by removing unreproducible results.
Yes, the Magic of 3 requires you to run more tests. But these are now automated tests; you simply press start. The time in running automated tests is much more efficient than you wasting your valuable time and brainpower analyzing unreproducible results.
So, for every test scenario, run it three times. Then on each set of results, conduct a preliminary analysis to validate that the results or the TPS plateaued at the same elapsed time.
If your results are erratic, stop right here! Are you sure you built a rock solid performance test harness? Is the target application code half-baked and throwing errors? You need a pristine QA-ed build in order to conduct efficient load testing.
Once your results can be reproducible 3x, you will have the confidence you need to invest your valuable time into analysis.
3. Ramp Up Your Load
Start With Ghost Tests
Start your tests by running ghost tests. Ghost tests check the system without executing load scripts. The ghost test has no real user activity, but it is important: the system is left alone to do housekeeping and process scheduled or tripped jobs, heartbeats, communications, etc. All that is important is that the monitored KPIs are collecting metrics.
You might be surprised at a number of resources your deployment is using without user load, and it’s better to know that now rather than trying to differentiate the user from system load further down the project. Use this test to calibrate your monitor KPIs. Establish resource usage patterns.
I recommend running the ghost test three different times during the day. If you find out that every half hour a job that crunches the DB server kicks off, you will want to have this activity isolated and known before executing your realistic load tests.
Move on to Single User Load Tests
Assign a single user to execute every single user and engineering script. Start all the tests at once. If you have a total of 23 scripts, then you will have 23 users executing. As mentioned earlier: three times, reproducible results.
This test is a benchmark to show the minimum response time achievable under a single user load — which is your best case scenario. Transactions’ minimum response time values are your transaction response times’ “floor.” You are also using the results of this test to identify the business transactions with the highest and lowest response times.
Ramp Up Your Load to the Desired Number of Users
Move onto your concurrent test scenarios: Create a slow ramping staircase scenario that allows for the capturing of three monitored KPI values for each SET Load. In other words, configure the slow ramp of users to sustain a duration before the next set of users are added. Your goal is to capture at least 3 KPIs metric values during that sustained load duration.
For example, If you are ramping by 10 or 100 users at a time and you are collecting KPIs at 15-second intervals, then each set load needs to run for a minimum of 45 seconds before ramping to the next load. Yes, this elongates the test (by slowing the ramp), but the results are much easier to interpret. I am using the magic number three here. This is to exclude anomalies. A spiking KPI metric that which isn’t sustained isn’t a trend.
Living by the law of halves and doubles when performance testing greatly simplifies your performance engineering approaches. Start off with a goal of achieving half the target load or peak users if the application scales to half the load. Then you can double it to the target load. If it does not, reduce the load by half again — and over and over, if need be. You definitely need to keep reducing by half until you get a scalable test, even if that’s just 10 users and your goal was 10,000!
That’s it for now. Join us next week for part 3 (the magic number!), the final part of this blog series.