Why Performance Projects Fail
Projects involving performance testing and engineering fail for a variety of reasons.
Join the DZone community and get the full member experience.Join For Free
Projects involving performance testing and engineering fail for a variety of reasons. The majority of performance project failures occur for various highly complex reasons from every phase of the development life cycle and performance testing life cycle. Sometimes, performance problems are uncontrollable, and it’s out of the control of a project manager, technical architects, or performance engineers. In my experience, from both business and personal levels, most the performance projects fail due to simply a lack of communication between performance engineers, developers, DBA's, business teams, and stakeholders from the beginning, and this ends up causing many other problems which will directly impact application performance and ROI. The only objective of strategic, effective performance testing for any application/product is to achieve a satisfactory return on investment. Performance testing and engineering the applications are risky and always require a lot of trial and error with rigorous testing from the early stages of development.
Failures in performance testing projects must be treated similarly to other business problems. It is essential to understand what went wrong, why it went wrong, and what can be done to prevent it. In most scenarios, the performance engineers have to run the one-man show role to make everyone educate/understand the performance challenges in the end-to-end full life cycle implementations. Working with Practice and COE teams, we continued seeing the same mistakes repeatedly from multiple teams and projects, so, based on my personal experience, I have compiled a list of reasons Why Performance Projects Fail.
1. Unclear and Incomplete NFRs
Gathering complete non-functional requirements are more complex than functional requirements as they have been treated as a second or even third-class type of requirements. As a result, they are frequently missed, misunderstood, and neglected, and there are only a few organizations/stakeholders that focus on NFRs as first-class requirements. This approach results in severe problems in the system's architecture/design in user experience. It frequently led to project breakdowns and website crashes like the Chicago Board Options Exchange (CBOE) crash, making systems unavailable. In most cases, the NFR Document was incomplete, inconsistent, or didn't exist in most unsuccessful performance testing projects. The first step in the performance testing life cycle is to do a feasibility analysis of the application/system and create a clear set of non-functional requirements for successful performance testing. A solid NFR document will identify all of the criteria that the product/application must have to be successful in production with the best performance. In addition, we have to:
- Establish clear performance goals and expectations for the product/application and system
- We must get everyone (performance engineers, development teams, Admin teams, DBA's, stakeholders, and business teams) on the same page.
- Build communication between technical teams, project teams, and business teams to understand the real performance problems of end-users in the production
- Use site analytical tools like Alexa, Pingdom, Google Analytics, and Omniture to get the production traffic statistics to create the suitable workload models
- Understand the architecture, design, issues, and existing performance problems of your application/product while collecting the non-functional requirements
- Non-functional requirements should be discussed from the start of the software development process and throughout its life cycle; baselining and benchmarking is necessary for performance testing if the application is brand new.
- Get the complete information of all the internal and external components involved and understand how they communicate (CDN's, Firewalls, DNS, Load Balancer, servers, networks, and systems, Cache, etc.,)
- Consider breaking down your business objectives into system requirements wrt performance.
- Understand the application footprint, third party, and architectural limitations
- Talk to stakeholders and business teams to understand the goals and determine what is essential, existing legacy systems performance problems, platform constraints, and competitors.
- It is necessary to document everything and get business and IT stakeholders in a meeting together to determine whether existing NFR’s are appropriate and to agree on SLAs defined are achievable.
2. Managers and Stakeholders Who Don't Understand or Listen to End-Users Who've Already Expressed Strong Concerns About Production Performance
Talking to end users might reveal existing potential performance issues you were not really aware of. To understand the existing performance problems in the production systems, we need to get continuous feedback from the end-users on how the application is behaving under different anticipated load conditions. There are always many users that use a feature in production; even though it wouldn't meet their expectations with desired performance, they won't question it and assume it's correct, which may be a big issue later for many when it's accessible from several locations at the same time. Therefore, if you want to improve the performance of your application, you must involve end-users to receive continuous feedback on the application's or system's performance in production. Of course, it takes time and effort to interact with end-users, even if it seems impossible. Still, it is definitely worth the value of delivering the greatest performance for your final product/application.
3. Poor Architectural Design
Initially, poor architecture causes only minor problems, which will be less at the beginning, but they start to add up. Simple maintenance is a challenge, and any change in one area breaks other parts of the application. Your application/system could experience severe performance degradations resulting in excessive network latency and other issues if inappropriate decisions were made during the architectural design phase. Performance Engineers need to understand the application architecture blueprint that organizes and helps to conceptualize a system. With no understanding of a clearly defined application/system architecture process, there is a high risk of having too much uncertainty and complexity during the load test executions phase, which can introduce unintended performance problems for performance testing and engineering teams. Enterprise software releases with ongoing performance testing for production scheduled either monthly or quarterly can be delayed due to performance engineering challenges during the application design and development phases of the software development life cycle.
4. Unforeseen Technology Dependencies
Dependencies are the connection between components that allow for more application and functionality application features and functionality. Specific OS versions, application servers, database servers, or a Java virtual machine, common language runtime, and dotnet framework are examples of simple dependencies. However, some dependencies are more complex, such as those composed of various packages in Linux, Java, and scripting languages such as Python and Ruby. Understanding each technology's dependency on each component in terms of design and infrastructure, which technologies are used, and what frameworks and tools are used to develop the application is critical for every performance engineer to accomplish performance testing with the desired results.
5. Not Paying Attention to Performance Goals
Setting goals is one of the most important aspects of a performance test's success. Many performance engineers and team members will constantly fall short of their performance goals for improvement, spending lots of time fixing the existing and hidden performance problems in the application/system. Perfect goals for performance testing should be defined, designed, and performed with the most realistic conditions, such as real browsers, devices, and multiple geo-locations. Determining the right metrics to monitor, what kind of performance we are measuring, defining the minimum thresholds for each metric, and executing the performance tests to come up with baseline results, and all these figures are necessary to determine what changes can create performance improvements. It is an excellent practice to start with performance testing as early as possible in the software development life cycle to eliminate the bottlenecks in the first place and ensure that your application is constantly checked for performance under heavy user load.
6. Excessive Expansion of New Features and Changes at the Last Minute of Every Release
Excessive feature expansion is a common major roadblock I've seen software developers and performance engineers encounter. The most effective way to tackle this situation is to regularly perform customer development meetings and discussions involving every team member to validate each feature and ensure that it meaningfully addresses the problem you set out to solve. Performance testing teams have to start by planning the test environments and release schedules and should proactively communicate how much time it needs to test the application from the performance front in case any new features are added at the last minute of the release. If your project has a fixed deadline, plan your environment requirements ahead of time to ensure that unexpected environment delays do not affect your performance testing schedule. Suppose this is the way developers and teams continue to add features at the last minute. In that case, the quality of the deliverables will be bound to suffer. Ultimately, the customers will reject the final deliverables, creating a scenario of rework and additional resourcing shortages.
7. Directly Focusing on Big Bang Success
Directly focusing on the target SLAs to achieve the acceptable limits (industry standard response times) may not be possible in the very first releases of performance test execution. Performance testing is an iterative process that requires lots of continuous performance testing to identify and eliminate all performance bottlenecks. Performance engineers should spend extra time optimizing each and every line of code and component to improve performance in the system/application. Every SLA and KPI is necessary in performance testing and getting the desired response times, throughput, network latency, and resource utilization is only possible with continuous performance testing, code profiling, memory profiling, performance engineering, monitoring, and tuning from both the client-side and server-side which would sometimes take years. Analyzing all the performance results and degradations and collecting the data using appropriate metrics from user-level, OS-level, system-level, network-level, and server-level is crucial for all performance engineers to perform root cause analysis
8. Capacity Planning Failure
In my experience, there are various reasons why many infrastructures fail at implementing effective capacity planning. In simple terms, the capacity planning process is not simple and straightforward. We can create a scenario, add traffic, evaluate the results, resolve performance issues, and repeat until we are satisfied, but the actual problem comes with poor capacity planning. Poor capacity planning increases the likelihood of missing project goals, and all risks become fully exposed, and this eventually leads to complete failure. All these can be avoided by properly addressing them through careful capacity planning. System analysts from the infrastructure area, Database administrators from the database area, and programmer analysts from the application development area are the three groups of individuals who need to be most involved in an effective capacity-planning process. Many of us sometimes confuse capacity management with capacity planning, and performance engineers, along with other teams, cant forecast accurately and predict wrong future workloads. Performance Engineers must ensure that we identify the accurate resource requirements (CPU, Memory, Disk space, and network bandwidth) to support current and future increased workloads to meet the business demands and avoid capacity planning failures. Continuous monitoring with the right metrics will help us to do effective capacity planning and also helps to tackle unexpected future workloads with increased traffic.
9. No Methodology
Due to the lack of proper methodology, it becomes very difficult to get effective performance test results while creating a performance test strategy and its coverages while creating a performance test strategy and its coverage. Understanding the performance testing methodology and the process will help every performance engineer on the team, especially when performance problems arise, by providing the right fix for each bottleneck occurred. Performance testing processes should be well-planned, defined, and well documented. Good documentation is something that builds efficient communication between the developers, DBA'a, and performance engineers. As software applications become more complex and multifaceted, and with a growing number of platforms and locations to test, it is more important to have a robust performance testing methodology for ensuring that software applications/systems being developed have been fully performance tested to ensure that they meet their specified business requirements and can successfully operate in all anticipated load conditions and environments with the required performance and capacity.
10. Performance Issues Are Not Completely Addressed
More performance problems are seen when we have increased the number of users on the application. Hidden performance problems and known performance issues in the application/systems are the primary reasons for continuous degraded performance over time. Every bottleneck identified has to be discussed with each and every team member in the project to successfully ensure performance with the customer SLAs. When it comes to performance problems, every minute counts, and if you ignore existing performance issues, your applications and systems will become significantly slower and even worse. For example, some services might stop functioning on a heavily overloaded server, leaving the applications inaccessible. We don't want to lose time finding out what your monitoring data are trying to tell you if a critical service is down. Hence, it's good to be familiar with detection methods and the most common causes of server performance issues.
Opinions expressed by DZone contributors are their own.