Application performance Review (also known as Application Performance Walkthrough or Application Performance Assessment) is the process of review of an existing application (in production) to evaluate its performance and scalability attributes. The performance characteristics of application are determined by its architecture and design. Applications must be architected and designed with sound principles and best practices. No amount of code fine-tuning can disguise the performance implications resulting from bad architecture or design decisions. Performance reviews let all stakeholders realize where they stand and take appropriate decisions.
Performance and Scalability
Performance and scalability are two quality-of-service (QoS) considerations. Other QoS attributes include availability, manageability, integrity, and security, which should be balanced with performance and scalability, and this often involves architecture and design tradeoffs.
Define Objective of Review
The first step in the review process for Performance and Scalability is to clearly identify and define the review objectives, which includes:
- How fast is fast enough?
- What are the application response time and throughput constraints?
- What is the user load the application is expected to support
- Server capacity
- How much CPU, memory, disk I/O, and network I/O is it acceptable for the application to consume?
- What is the expected application load
- Peak and Off-Peak traffic
- How is the load expected to increase/decrease in the near and longer term
Define Performance SLA
The SLA’s of performance should be clearly called out to include
- How many concurrent user will be connecting to the application
- Expected Response time to the end user (in case of client facing application)
- Time to ingest data (in case of feed based data loads) and make it available to end users for consumption
- Refine SLA’s in alignment with external system dependencies
Measure Current Application Performance
Measure and baseline the current application performance. The measurement process should validate and verify that the application
- meets the requirements that guided its design and development,
- works as expected,
- can be implemented with the same characteristics,
- And satisfies the needs of stakeholders.
Testing process includes
- Define the test plan
- Review the automation scripts (if available)
- Have proper test environment and sufficient test data
- Load Test - how the application behaves under a heavy load. This test yields information and details on the utilization of memory, CPU, etc.
- Stress Test - determine the maximum performance limits of an application.
- Scalability Test - how adaptable the application is to changes in software and hardware.
Understand the existing application in terms of:
- Domain–the application domain and the business goals (broad level)
- Technology stack of the application
- All components that work together to achieve the business functionalities
- External COTS tools leveraged
- External Interfaces
- Other external components consumed
- Functional flow in terms of business functionalities
- Data flow to understand the business entities involved and the data flow at various stages
- NFR–All Non-Functional Requirements pertaining to the project.
Key Design consideration of a High performance includes the following
- Low latency - low Page Loading times
- Scalability - Application that can serve ever increasing number of users
- Availability - Application that does not go down (highly / continuously available)
Some of the key contributors to application Latency and ways to overcome them include:
One of the key contributors to latency is the application tiering. The hops from WebServer -> Application Server -> Database and back, data serialization/deserialization are some of the biggest contributors to the overall latency.
Bring data close to application
Data needs to be close to the application so that making all those Database connection calls and getting data from DB can be reduced.
Another weak link in the application performance chain is Disk I/O. One way to overcome the limitations with regards to the Disk I/O is to keep data in memory. InMemory / Embedded databases (like Volt DB or Solid DB or Oracle TimesTen or SQL Lite), XTP solutions (like Oracle coherence, IBM eXtreme Scale, GigaSpaces eXtreme Application Platform, Redhat’s JBoss Data Grid) can be used to speed up the application performance.
The hardware on which application is hosted can also be tuned to reduce latency. Optimizations like 10G/20G network, fiber channels, low latency switches, SSD (Solid State Drives), not using virtualization can make sure the application latency is reduced.
At times, the transport mechanism can also add to the application latency. E.g. secure communication (like https) can add to the latency with the additional overhead of deciphering the data at the receiving end. One way is to offload the SSL at the Load Balancer/Firewall.
Scalability indicates the ability of an application to handle growing amount of data and concurrency in an efficient manner without impacting performance. An important thing to notice is scalability should not be at the cost of application performance. Some of the techniques that can help scale the application include
Stateless Application/Service - The application should store its state in a centralized repository, but the application itself should be stateless. It means no storing of data or state on local file systems. Stateless applications allow one to add any number of application instances to accommodate the increasing growth.
Load Balancing - As the traffic starts going up, the application should be designed to handle the additional load by adding additional server instances to service the requests. The load balancer will make sure none of the servers are working beyond their stated load and new instance should be automatically added as and when the load goes up (auto-scaling. One can also add load balance to database with techniques like Master-Master topology or Master-Slave (with partitioning read and write data) to handle the additional load. But if the data is going in Petabytes ranges, data sharding with data replication techniques need to be used. The in-memory data grid architecture can also be utilized to scale the data.
Fault Tolerance/Dynamic Discoverable Elements - When dealing with an application that is running in large clusters, it is very important to avoid manual interventions, e.g. when the application load reaches a defined load, the application monitoring should be able to add a new instance and load balancer should be able to recognize the same to utilize it. Similarly, when data gets shared, the applications should be able to recognize and look up the new IP to connect. Also, if the application is not able to connect to a particular resource, the application should be able to recognize the fault and try accessing the alternate resource availability. The application will need to have a central metadata repository for all such fault tolerance scenarios that can be tapped by the application.
Availability of an application is a function of scalability. The following factors have an impact on the application availability:
Redundancy. The application needs to be scalable to be able to compensate for the loss of any instance (whether hardware or software). The redundancy needs to be built at all layers, software, hardware, power and even at data center levels, e.g. real-time data mirroring or data sync across data centers that are located geographically apart.
Fault Tolerance. The application needs to be fault tolerant (e.g. retry mechanism) to make sure it can take advantage of dynamically allocated resources to keep functioning.
Monitoring/Testing. Another overlooked factor of application availability is application monitoring. If an application is not properly monitored, outages can go undetected leading to application unavailability. Ability to monitor the entire application stack and take corrective actions is very important.
Configuration Data. Any application that needs to be continuously available needs to be able to run using configuration, e.g. if the application introduces the new service interface, the application should have the ability to either make use of the new interface or keep using the old one. This factor becomes very important when rolling out new features/services and all of them cannot be rolled out at once.
Performance Design Categories
Performance design categories include:
Coupling and Cohesion
Loose coupling and high cohesion
Transport mechanism, boundaries, remote interface design, round trips, serialization, bandwidth
Transactions, locks, threading, queuing
Allocating, creating, destroying, pooling
Per user, application-wide, data volatility
Per user, application-wide, persistence, location
Data Structures and Algorithms
Choice of algorithm e.g. Arrays versus collections
Design elements and principles describe fundamental ideas about the practice of good design. Ensure that the 3 important characteristics of a bad design are avoided, namely
- Rigidity - It is hard to change because every change affects too many other parts of the system.
- Fragility - When we make a change, unexpected parts of the system break.
- Immobility - It is hard to reuse in another application because it cannot be disentangled from the current application.
Good Design Principles include:
The application should be easily maintainable. It should have appropriate Error / Debug / Trace logs in place to ease maintenance effort
The application should be built as independent modules which collaborate together to deliver the desired functionality
Redundancy is a critical characteristics of scalability, but redundancy should be maintained at a minimum possible level
Application should be designed with support for to Globalization and Localization
Application should be designed to be easily extensible to add more functionality
Should be easily understandable and simple
Application should use the available resources to an optimal level
Should be able to serve test of time
Should be backward compatible for atleast 2 previous releases
Application should have the ability to automatically interpret the information exchanged meaningfully and accurately
The application should be stable in terms of functionalities and the NFR’s
Application should be robust to cope with errors at runtime
Design guidelines help designers ensure consistency and ease of use by providing a unified programming model that is independent of the programming language used for development. The following design principles are abstracted from architectures that have scaled and performed well over time:
Design coarse-grained services. Coarse-grained services minimize the number of client-service interactions and help design cohesive units of work. Coarse-grained services also help abstract service internals from the client and provide a looser coupling between the client and service. Loose coupling increases the ability to encapsulate change. If fine-grained services are already available, consider wrapping them with a facade layer to help achieve the benefits of a coarse-grained service.
Minimize round trips by batching work. Minimize round trips to reduce call latency. For example, batch calls together and design coarse-grained services that allow performing a single logical operation by using a single round trip. Apply this principle to reduce communication across boundaries such as threads, processes, processors, or servers. This principle is particularly important when making remote server calls across a network.
Acquire late and release early. Minimize the duration that shared and limited resources such as network and database connections are held. Releasing and re-acquiring such resources from the operating system can be expensive, so consider a recycling plan to support "acquire late and release early." This enables to optimize the use of shared resources across requests.
Evaluate affinity with processing resources. When certain resources are only available from certain servers or processors, there is an affinity between the resource and the server or processor. While affinity can improve performance, it can also impact scalability. Carefully evaluate scalability needs. Should we add more processors or servers? If application requests are bound by affinity to a particular processor or server, we could inhibit application's ability to scale. As the load on the application increases, the ability to distribute processing across processors or servers influences the potential capacity of the application.
Put the processing closer to the resources it needs. If the processing involves a lot of client-service interaction, we should push the processing closer to the client. If the processing interacts intensively with the data store, we may want to push the processing closer to the data.
Pool shared resources. Pool shared resources that are scarce or expensive to create such as database or network connections. Use pooling to help eliminate performance overhead associated with establishing access to resources and to improve scalability by sharing a limited number of resources among a much larger number of clients.
Avoid unnecessary work. Use techniques such as caching, avoiding round trips, and validating input early to reduce unnecessary processing.
Reduce contention. Blocking and hotspots are common sources of contention. Blocking is caused by long-running tasks such as expensive I/O operations. Hotspots result from concentrated access to certain data that everyone needs. Avoid blocking while accessing resources because resource contention leads to requests being queued, Contention can be subtle. Consider a database scenario, on the one hand, large tables must be indexed very carefully to avoid blocking due to intensive I/O. However, many clients will be able to access different parts of the table with no difficulty. On the other hand, small tables are unlikely to have I/O problems but might be used so frequently by so many clients that they are hotly contested.
Use progressive processing. Use efficient practices for handling data changes. For example, perform incremental updates. When a portion of data changes, process the changed portion and not all of the data. Also, consider rendering output progressively. Do not block on the entire result set when we can give the user an initial portion and some interactivity earlier.
Process independent tasks concurrently. When we need to process multiple independent tasks, we can asynchronously execute those tasks to perform them concurrently. Asynchronous processing offers the most benefits to I/O bound tasks but has limited benefits when the tasks are CPU-bound and restricted to a single processor. If we plan to deploy on single-CPU servers, additional threads guarantee context switching, and because there is no real multithreading, there are likely to be only limited gains. Single CPU-bound multithreaded tasks perform relatively slowly due to the overhead of thread switching.
The main deployment issues to recognize at design time are the following:
- Consider your deployment architecture.
- Identify constraints and assumptions early.
- Evaluate server affinity.
- Use a layered design.
- Stay in the same process.
- Do not remote application logic unless required.
Perform automated code review using code review tools to review the application code in terms of guidelines and best practices. Some customizable static code analysis tools include:
- .Net Reflector
- NUnit, NDoc, NCoverage
- Snippet Compiler
- VS Code Analysis
Perform an independent manual analysis of the current application in terms of the following:
- Code review to include
- External Interface points
- Internal application interaction point including
- Web Server to App Server
- App Server to DB
- Other interaction if any
- Business layer code
- Database schema and stored procedure
- Performance review to cover
- Client side performance
- Server side performance
- Database code performance
- No of calls between the multiple application layers (UI layer, Business layer, Database, etc.)
- Data transfer between the multiple application layers
- Server Logs
- Review configurations to include
- Timeouts in servers
- Port settings
- Throttling settings
- Performance counters
- Cache setting
- Static data caching
- Static file caching for Web-based front Ends
- Distributed Caching
- Review hardware configurations in terms of
- Hard Disk / Solid State Drives
- Network adapters
- Review Deployment model
- Deployment configurations
- Server Topology
- Load Balancer configuration
- Server scalability constraints (if any)
- Number of Web, Application, DB servers
- Servers configuration and mapping
- Server communication model
- Database replication model
- Production and Test Environments
Perform a SWOT (Strength, Weakness, Opportunity, Threat) analysis of the current application and highlight it in terms of:
- Current Design
- Technology Stack
- Code base
- Best practices
Arrive at recommendations based on the above reviews. The recommendation shall include:
- Design changes in terms of
- Functional / Data flow
- Reusable components
- Code Refactoring
- Integrate third party COTS products
- Define the POC requirements based on the analysis and review process
- Trace requirements in terms of functionality and performance SLA’s
- Evaluate third party COTS tools as application
Based on the recommendations and the outcome of the POC, start implementing the recommendations in a phased manner.