“The app is slow, can you make sure it is fast.”
The quote above should send shivers down the spine for any experienced engineer. In our previous posts we have constantly stressed out the part of “measuring not guessing” when dealing with performance tuning. Even more important though is to have defined the meaning of “fast”.
Unless you have a definition for the “fast” part, you can pretty much spend forever in the optimization cycle as every non-trivial application can always be made “faster” in some regards. In the real world, performance is unfortunately not the only requirement we need to fulfill, so in order to provide the most value we should know when to stop the performance optimization. Or more importantly, towards which goals the performance tuning activities should lead us.
Badly defined performance requirements
Business owners have become better and better in expressing the functional requirements for the software. But when thinking outside the functional requirements – be it usability, compatibility or performance – the mind of a business owner often draws a blank. This blank spot can take the form of “make sure it is fast” or in a bit better case, you will have something similar to the following to work with:
- 95% of the operations carried out in the system must respond within 5 seconds
- The system has to support 100 concurrent users
“Not so bad” you might think at the first glance. Instead of speaking in “fast” terms, you now have a clear target towards which to steer, don’t you? As a matter of fact, the above is even worse than “fast”. As it now contains some numbers it looks like it is something you could use as the ultimate goal.
In reality, the two requirements above are at best only the foundation using which you can start asking additional questions. Let me open up what is wrong with these requirements.
“95% of the operations must respond within 5 seconds”.
What is supposed to happen with the remaining 5%? Is the goal to fit the response times to 10 seconds or is it OK to timeout these connections? Instead of fixing a single goal, you should set acceptable latency distribution.
Should you really treat all operations as equivalent, for example is it OK in both cases to have 95% of the requests to return in 4.9 seconds:
- “show my current account balance” - executed potentially millions of times during a day and is the first question each and every retail client has while interacting with their bank.
- “show all transactions either debiting or crediting my account during 2013” - needed only a couple of times a day during more exotic use cases.
I would assume you would need to treat the first operation differently and have more demanding targets, possibly relaxing the requirements on the second. So, instead of treating all operations as equivalent, you should set acceptable latency distribution per operation type (or operation category).
What is the load in the system while the measurements are being made? How many other operations could be taking place simultaneously? This is where you should link your latency-related requirements to load/throughput related requirements.
Is the response time measured in end user environment (such as the browser rendering the response or an Android app updating the results) or when the last byte was sent out from the server side? Instead of ambiguously defining the measurement criteria, be precise in which layer the latency is measured.
What about the batch jobs/asynchronous processes? Is the monthly batch job running for 2 hours calculating the final credit card balances considered a violation of the 5 second threshold or not? But what about the full account statements compiled asynchronously into CSV for large business accounts and sent via email 10 minutes later? So, be also clear for which operations the latency is not relevant.
“The system has to support 100 concurrent users”.
Well, 100 users on your site each clicking on static images every 10 seconds served via CDN – I bet you can build a system like that with your eyes closed. 100 users simultaneously encoding 4K video files on your site – you better be scared. Really scared.
Things turn from ambiguous to meaningless when thinking in terms of real concurrency, such as translating the “100 concurrent users” to “100 operations concurrently being processed by 100 threads”. Assuming each such operation takes 10 seconds to process, then the throughput of your system is 10 ops/sec. If you now reduce the operation duration tenfold, with each operation taking just one second you have improved the throughput to 100 ops/sec. But lo and behold, you are not fulfilling your “100 real concurrent users” requirement and are only processing 10 operations concurrently, failing upon the requirement.
Instead of “concurrent users” or any other similar terms, the requirements should more clearly express the behaviour of certain users, with the potential of turning these descriptions into load tests allowing you to emulate the necessary load.
Note that I am not recommending to measure throughput here – real-world applications are often multifunctional and are being used in very dynamic situations. This makes it hard to express the performance goals in throughput (operations per hour).
But if a particular application is designed to do just one thing, for example invoice payments, then having goals measuring throughput similar to “1000 invoices/minute” is an excellent way to have measurable and specific goals.
What is the amount of data your application should fulfill the performance requirements? Are you expected to achieve the set goals with 10,000 accounts and 10,000,000 transactions in the database or is the system expected to fulfill such criteria with 1,000,000 accounts and 1,000,000,000 transactions? Be clear about the volume of data present in the system.
What are your constraints regarding the infrastructure? Are you expected to achieve the goals within $500/mo AWS bill or can you go berserk and deploy the solution on the high-end stuff with 32 cores and several TB of memory? Knowing this, helps you to understand the limitations from the infrastructure. So, you should specify the constraints from the infrastructure.
Would it be OK to rely on the network presence? Will the network bandwidth be acceptable to send several MBs back and forth during each operation? With the widespread adoption of mobile apps, you cannot count upon the almighty 4G being present and might need to support offline operations and squeeze the traffic into kilobytes instead of megabytes. So, you need to understand the situation your application will be deployed.
The list of aspects described in this post is by no means complete. For example, when you start linking concepts such as scalability or availability to the mix, you start facing a whole new set of requirements. But I hope the post fulfilled the intention and next time when you meet vaguely defined performance requirements, you have a set of questions to start drilling into the actual need.
Work with the business owner in dialogue helping to discover measurable and specific goals. Without such goals, you have no real target to achieve or measure your results against.
Walking through the process allows you to explain the related costs as well. If you recall, everything can always be made faster. The question is – whether it is economically viable. From the business owner perspective, it is only natural to wish all the operations to be ultra fast. Only when understanding the costs for achieving this, the more realistic expectations can be set.