A Short History of Performance Engineering
A Short History of Performance Engineering
Take a walk through the hallowed—if not entirely complete—halls of performance history from the beginning to the end (so far).
Join the DZone community and get the full member experience.Join For Free
SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.
Performance engineering has a rather long and fascinating history, especially if considered in the context of changing computing paradigms. While not everything in the past may be applied to every new technology, the underlying principles often remain the same—and knowledge of history keeps us from reinventing the wheel when it is unnecessary. Unfortunately, statements referring to the past are not often completely correct. The history of performance engineering is not well-known, so here is some information that I find to be quite interesting. The approach was to find the first mature appearance of still-relevant performance concepts (without diving into the in-depth history of each notion). It is not scientific research, and not much information is available overall—so a lot of important information may still be missed.
We will start by listing the following computing paradigms:
Performance expertise related to a paradigm usually materializes later when the technology is more mature.
Performance went beyond single-user profiling when mainframes started to support multiprogramming. In the early mainframe years, processing was concerned mainly with batch loads. Mainframes, however, had sophisticated scheduling and could ration consumed resources. They also had pretty powerful OS-level instrumentation allowing the engineers to track down performance issues. The cost of mainframe resources was high; therefore, capacity planners and performance analysts were needed to optimize mainframe usage.
While not everything in the past may be applied to every new technology, the underlying principles often remain the same—and knowledge of history keeps us from reinventing the wheel when it is unnecessary.
We can definitely say that performance engineering became a distinct discipline when instrumentation was introduced with SMF (System Management Facilities), released as part of OS/360 in 1966 (still in use in IBM z/OS mainframes today).
In 1968, Robert Miller (IBM) in his Response Time in Man-Computer Conversational Transactions paper described several threshold levels of human attention. The paper was widely cited by many later researchers and remains mostly relevant now.
In 1974, monitoring was introduced with RMF (Resource Measurement Facility) as part of MVS (still in use). OMEGAMON for MVS by Candle (acquired by IBM in 2004), released in 1975, is often claimed to be the first real-time monitor.
A performance community, the Computer Measurement Group (CMG), was created in 1974, holding annual conferences ever since—now across a wide spectrum of technologies.
In 1977, BEST/1 was released by BGS Systems (acquired by BMC in 1998), the first commercial package for computer performance analysis and capacity planning to be based on analytic models.
When the paradigm changed to client-server and distributed systems, the available operating systems at the time didn't have much instrumentation or workload management capabilities. Load testing and system-level monitoring became the primary ways to handle multi-user performance. Deploying across multiple machines was more difficult, and the cost of rollback was significant, especially for Commercial Off-The-Shelf (COTS) software that might be deployed by hundreds or even thousands of customers. Thus, there was more of a need for performance design to be correct from the beginning.
"Fix-it-later was a viable approach in the 1970s, but today, the original premises no longer hold - and fix-it-later is archaic and dangerous. The original premises were:
Performance problems are rare.
Hardware is fast and inexpensive.
It's too expensive to build responsive software.
You can tune software later, if necessary."
Have you heard something like this recently? That is a quote from Dr. Connie Smith's Performance Engineering of Software Systems, published in 1990. The book presented the foundations of software performance engineering, and already had 15 pages of bibliography on the subject.
The most known load testing tool, LoadRunner, was released in 1991 by Mercury Interactive (acquired by HP in 2006, now part of Micro Focus). For a while, load testing became the main way to ensure high performance of distributed systems, and performance testing groups became the centers of performance-related activities in many organizations.
"Fix-it-later was a viable approach in the 1970s, but today, the original premises no longer hold - and fix-it- later is archaic and dangerous."
The term Application Performance Management (APM) was coined by Programart Corp. (acquired by Compuware in 1999) in 1992 (in the mainframe context, as a combination of their STROBE and APMpower tools). However, STROBE, which they refer to as an application performance measurement tool, had been on the market since the '70s. Still, there is an opinion that the first APM tool—as we know them now—was Introscope by Wily Technology, founded by Lew Cirne in 1998 (acquired by CA in 2006).
The history of End-User Monitoring (EUM)/Real-User Monitoring (RUM) can be traced at least to ETEWatch (End-to-End Watch), an application response time monitor released in 1998 by Candle (acquired by IBM in 2004, then a part of Tivoli). However, EUM/RUM gained popularity later with development of web and mobile technologies.
Web and Mobile
Most existing expertise was still applicable to the back-end. The first books to apply existing knowledge and techniques to the web were published in 1998—for example, Web Performance Tuning and Capacity Planning for Web Performance.
In 2007, Steve Souders published High Performance Web Sites: Essential Knowledge for Front-End Engineers, stating that 80-90% of user response time is spent in the browser, which started a movement of Web Performance Optimization (WPO) centered on the client-side.
The WPO community was built around the Velocity Conference (first held in 2008) and Web Performance meetups. Velocity was a very popular performance conference—at least until Steve Souders stepped off as an organizer, O'Reilly merged Web Performance into the Fluent conference, and Velocity became more of a DevOps conference. Maybe it was an indication that WPO had become more mature and integrated with other aspects of technology.
Mobile technologies supported the further development of web performance, as client-side performance was even more important on mobile devices.
In the last ten years, we saw another paradigm shift to the cloud. While the term "cloud computing" was popularized when Amazon released its Elastic Compute Cloud in 2006, references to "cloud computing" have appeared as early as 1996. Technologies mature more quickly nowadays—for example, Amazon's own monitoring solution CloudWatch was released only three years later, in 2009. Of course, many established performance products started to support cloud, and new products still enter the market.
While the cloud looks much different than mainframes, there are many similarities between them, especially from a performance point of view. They both provide:
Availability of computer resources to be allocated.
An easy way to evaluate the cost associated with these resources and implement chargeback.
Isolation of systems inside a larger pool of resources.
Easier ways to deploy a system and pull it back if needed without impacting other systems.
However, there are notable differences that make managing performance in the cloud more challenging. First of all, there is no instrumentation on the OS level, and even resource monitoring becomes less reliable due to the virtualization layer. So, instrumentation must be on the application level. Second, systems are not completely isolated from the performance point of view, and they could impact each other, and we mostly have multi-user interactive workloads, which are difficult to predict and manage. That means that such performance risk mitigation approaches like APM, performance testing, and capacity management become very important in a cloud environment.
What Lies Ahead?
While performance is the result of all design and implementation details, the performance engineering area remains very siloed—maybe due to historic reasons, maybe due to the huge scope of expertise. People and organizations trying to condense all performance-related activities together are rather few and far apart. Attempts to span different silos (for example, DevOps) often leave many important performance engineering areas out.
Technologies mature more quickly nowadays—for example, Amazon's own monitoring solution CloudWatch was released only three years later, in 2009.
The main lesson of the history is that the feeling that we are close to solving performance problems has existed for the last 50+ years, and it will probably stay with us for a while—so instead of hoping for a silver bullet, it is better to understand different existing approaches to mitigating performance risks and find an optimal combination of them to address performance risks in your particular context.
Opinions expressed by DZone contributors are their own.