Taming the Performance Beast – a Practitioner’s Way (Part 1)
Performance tuning can be a complex and time consuming process with good chance of you getting frustrated if you do not have abundant patience, especially for products/applications that have high demand on performance but have poor foundation with respect to it. It is an ongoing process – often long and frustrating, that works in an iterative mode and needs to be treated and managed differently from normal development process. One should take a note that this is not about a silver bullet but concerted, honest and persevered effort. If something does not go well, it can be horribly wrong with the potential of rendering the application worse performing than it was initially. This article tells about first-hand real-time experience with tuning including suggestions and guidelines on how one should approach it in order to successfully achieve it.
Causes of Performance Failure
Experience tells that most performance failures are due to fundamental and basic architectural and design fault or shortcoming rather than coding issue. This does not mean that inefficient code is not a culprit, in fact sometimes very minimal change in the code whether it is in java (like changing the data structure, logic behind looping etc) or in database SQL queries (like tweaking the where clause, introducing indexes etc) renders a good amount of performance gain. But the extent to which performance can be gained through code changes is limited by the architecture and design of the system.
This also means that performance issue is introduced in the system very early in the development lifecycle. However, this is ignored till it becomes an utmost necessity, lest the organization is at the risk of losing business, losing customer, losing competitiveness or damaging customer relationships. Most of the time, the organization has some reason to follow this path – pressure of delivering the functional aspect at the earliest, and this gives rise to the attitude of “Fix-It-Later”. That is, “Deliver the finished software at the earliest, if there is performance problem later we will fix it then”. This attitude also originates from the misconceptions that:
“Performance activity will delay the functional delivery”
Well, it will impact the deadline to some extent. But doing performance at later stage incurs huge time and money cost. Besides, it also has the potential to make system unstable functionally if there is substantial amount of change in architecture and code. So there will be addition cost on the functional testing and if required, fixing the functional issues. And this becomes more complex when you have parallel development on the same software – one for functional and the other for performance. Then you need to merge them back, do testing both ways, fix issues both ways and so on. Sometimes, this goes into a circle.
“You really should not worry about performance until you have code working”
This refers to too much time spent on low-level optimization
of code in initial period of development, which generally turns out to be a
waste of time. In the beginning, it is often unclear how much any single piece
of code will be executed. The larger the program the more it is true. However
this does not mean that one should not think about the overall performance
requirements of the system from the very start. If one has no idea what the
performance requirements for the system are, then he/she will most likely
produce an application that will not meet the goal.
As with most things, a balanced approach is more prudent. It is quite likely that if one waits until after the design and code are done to think about performance, then he/she will end up with code that has major problems. On the other hand, if somebody spends all his time optimizing every last line of code, you will never ship a product.
“Performance can only be done at later stage when you have sufficient code to analyze and enough feature to test and measure”
In reality, most of performance problems occurs at the inter component communication and can also be attributed to the design of the components themselves. These things can be predicted at the architecture and design phase. So the performance activity can start from the initial phase of the development.
Performance steps involved in getting software to perform generally break down into three:
During Design – The easiest performance problems to fix are the ones that aren't in the first place. Typically this is a period of architecture and design in development cycle.A good clean and simple architecture and design is very important. In large number of cases, systems are either over-designed or under-designed. Both of these are not good for performance. There are many good architectural and design principles, methodologies and patterns that one can choose from. But not all of them are suitable for all situations. Even the normalized databases (RDBMS) are not suitable for high performance as they become a bottleneck when the data sizes become huge. That is why systems are embracing non-normalized, unstructured storage whenever and wherever suitable. Generally the unwieldy architecture leaves a system to tell tale of woe. Some useful reality check should be done at this stage:
- Ensure relevance by grading usefulness and conformance to future system roadmap
- Start small, deliver the solution and check the relevance in real-time before becoming too ambitious
- Define a roadmap for the core components and services in the system
During Coding - "We just want to get the code to function and then we'll fix the performance problem. So wait until we have finished writing it." This is right to an extent. If code does not function correctly, it does not matter how much quickly it works. However, the statement really just exposes a misconception about nature of well-performing code- “It is by nature harder to write well-performing code”. On the contrary, it is easier. In Java, you need to follow standard coding best practices – avoid creating unnecessary objects, be careful with Strings, prefer lazy initialization, try to minimize mutability of a class, try to use standard library instead of creating it from scratch, try to use Primitive types instead of wrapper classes wherever possible, use right kind of collections to hold objects, be careful with looping etc, etc… Like in the case of design, simple and readable code is also important. Luckily in many cases, there is no conflict between code, simplicity and performance. The simplest code usually performs best. In some cases, it will be quite clear at design time that a piece of code will be used very often. If this is the case, then it is worth spending some time thinking about how this piece of code might perform, how it interacts with the rest of the program. For example, implementing some core components (like security) or writing some common classes (like Logging).
After Functional Coding – After the code has been written and it has produced correct functional results, performance tests can be done to check whether it meets the performance requirements. Most of the time, it won’t and it is worthwhile to look at some other ways to improve it. Unfortunately, this is the phase of the performance cycle that most of the systems finally land into for their entire performance activities. More on this in the next section.
As mentioned in the earlier section, most of the time we start performance activities only when the application has been built functionally and delivered. Obviously this is not the best way to look at performance improvement. But then, given that it is the often treaded path, we must look at how we can still do the performance and scalability work most efficiently.
Generally, there are two scenarios to deal with depending upon how well performing the system is.
· One, if the system is really horrible performance wise, that is if the system is not even testable at the targeted load; then start tuning the system at lower volume. Select a volume/load at which the system is reasonably good with performance. Tune the system at this load to bring it to a level where it can perform reasonably at the next higher load. And repeat these steps till the system is brought to the target load. Care must be taken not to over-indulge in the tuning at the intermediate stages in order to make it optimally performing. Because that is not the end goal, some of the work done at intermediate stages may be overridden or replaced by some new work done at later stages and importantly, this will just increase the effort. And every time there is an increase in the workload, there may be another bottleneck. But this is normal; every piece of hardware and software has a finite physical limit. Once at the target load, further performance activities can be carried out to bring it to the required performance parameters.
· Second, in case of systems which are fine at the target load functionally, one can start tuning the system at the target load itself till the performance requirements are met.
Performance tuning is an iterative and repetitive process; it requires a disciplined method of testing, analysis, and targeted improvements and of course loads of patience. After the system is tuned once, the process is repeated until the performance goal of the system is met.
Figure 1: The iterative, data-driven process for performance tuning and optimization
The steps in the iterative and repetitive process as illustrated in the figure 1 are as below:
· Performance Test: Start with performance test on baseline data. Then repetitively perform it on applied changes and solutions to regularly evaluate its performance effect.
· Collect Data: Capture performance data after each test, measure it. If the collected data falls within the required performance matrix, benchmark it.
· Identify Bottlenecks: Analyze the above captured performance data to identify any bottleneck(s) that is holding back the system to perform better.
· Identify Changes and Solutions: Based on the identified bottleneck(s) above, explore and identify changes and/or solutions to address those bottleneck(s).
· Apply Changes/Solutions: Implement and deploy the solution identified in the previous step.
How does it differ from normal (functional) development process?
The most important and critical differences are
The testing is an integral part of the performance development team. Unlike in the functional development process where the development team develops the functionality in isolation separate from the testing team and testing team tests the system isolated from the development team, in performance, testing provides the critical input and influences decision to next set of changes. The development and testing go hand-in-hand, both the team needs to be embedded into one. However, for benchmarking data, you can have independent performance testing.
The performance tuning environment should be similar or as close to actual production environment as possible. It is critical to establish performance test environment that mimics the production environment. Unlike in the functional development, where hardware and memory requirements of the test environment is not that critical for its completion, the performance test environment is used to identify and remove performance and scalability bottlenecks using iterative, data-driven and top-down process. And any benchmarked data captured on an environment that is dissimilar to the production environment will be invalid and work done on this environment will not deliver the same result on the actual production environment.
Types of Performance Problems:
Generally, there are three levels of performance optimization considerations for a j2ee-application-server-based application, as depicted in the figure 2. .
Figure 2: Three levels of performance problems
These are immediately obvious and easiest to deal with. This type of changes normally encompasses low effort, easy and simple tweaking in the application and infrastructure that brings about considerable performance improvement. These are often caused in the implementation of the application. Problems may range from simple language misuse, not following best practices to outright bugs. The solution to these problems is often a simple substitution (a better logic, better algorithm, more appropriate looping structure, suitable collection object, a more appropriate utility object etc). For example, increase in parallelism by tweaking thread sizes etc, optimizing a SQL query, changing looping logic in java, changing the collection framework to hold objects etc.
· Application Design
The design of an application and how it interacts with different layers in the system as well as how different components in the application interact with each other has a huge impact on the performance of an application. This seems to be obvious, but more often than not, this aspect is ignored. What appears to work perfectly fine at the initial stages may turn out to be a disaster at later stages. Sometimes, a small tweak to the design solves the problem, but another time the base design has to be replaced by a new design. This proves costly and risky as well.
· The Science of Java
The surrounding environment and the platform on which the application is running also have a huge impact on the performance. This type of problems deals with the underlying science – physics and mathematics of the application. For a Java application, it can start from JVM to Application Server to OS and finally to the database. Factors like processor architecture, memory hierarchy, storage system, workload of the target system, operating system, network conditions, communication factors (bandwidth of IO channels, requirement of data transformation) and program organization will affect the performance model. Additionally, the economic constraints (limitation on hardware and software) and the political factors (like the application needs to be behind firewall and in the DMZ zone) also impact the performance of the application.
The thorough understanding of Java Science is very important to building performing enterprise level application.
A typical execution of a Java application on a mixed mode JVM involves class loading, interpretation, profiling, hot methods detection, compilation and garbage collection (GC). All these activities compete for the CPU time and for cache. Furthermore, there are different implementations of Java virtual machines available, each with a unique run-time execution pattern. Different OS has different flavors of JRE (Java Runtime Environment) that impact the consistency of the performance of the application as well as its functional behavior. For example, GC policies differ in JRE on Windows from that on IBM platform. So it is necessary that GC policies are tuned according to the JRE hosted on a particular OS.
CPU and Memory
Java handles its memory in generally two areas – heap and stack. The JVM stores all objects created by the application in the heap whereas in stack the methods invocation and local variables are stored. When there is increased processing due to larger load on the system, the memory, and particularly heap, gets filled up fast and GC is triggered frequently to free up memory. Since GC holds up the java processing till it finishes its operation and also it garners some amount of CPU time, the overall performance of the application suffers. So it is important that GC is not that frequent. To avoid frequent GC, one can:
· Increase the heap size – Heap size can be increased to some extent, but it cannot be a remedy to the ever increasing GC frequency. Too much of heap size also impacts the performance.
· Choose appropriate GC policy – Adopting a GC policy is influenced by the type of performance the application is geared towards. But there is no thumb rule for this – it should be taken on case to case basis. In general, for response time-bound application parallel young-generation is recommended which allows for short minor GCs (with shorter suspension). The throughput-oriented application incremental or concurrent GC is recommended.
· Revisit the application/component design and implementation – When there is more and more objects getting created in proportion to increase in load and parallelism, then it is necessary to look at minimizing the creation of new objects, how already created objects can be re-used. For the heap size increase and the GC policy can solve the problem of frequent GC to some extent only, after some time it will hit the wall again unless the implementation and design are not revisited.
Interactions with Other Environments
Most enterprise Java applications involve middleware products, such as a database, a messaging service, network services, business processes, each with its own environment. The science involves the overhead (in terms of communication, transformation and environment switch) of transporting requests, responses and information between environments. This has cost associated with it that needs to be understood and accounted for in any performance analysis in order for it to avoid becoming a bottleneck in the performance of the application.
This article describes a top-down, iterative and data-driven approach to tackle enterprise Java performance problem. The whole system needs to be targeted for holistic gain on performance, including the software/hardware stack at system level, the software applications, and the virtual machines and the physical hardware. It should be kept in mind that no single solution works for all applications. At some point it boils down to the attitude of becoming a NEXT vs becoming the FIRST. Every different system has some unique characteristics with some unique set of its own bottlenecks. It requires some degree of innovation and ingenuity to tame these beasts.