Have you ever heard someone say “Geez, it looks like I missed all the fun”? Well, that’s how I feel. Although I must admit, after spending close to two weeks in Aruba on vacation, I did have some fun of my own. (Note: Details on that will have to be in another blog, or over a piña colada or something that looks like Jimmy Buffett had a hand in mixing it.)
But back to the matter at hand.
Outages. Bad user experience. Negative branding. And why testing in production is not only a best practice — it’s a necessity.
Geek Squad Hits Aruba
Apparently, while I had my toes in the sand 1736 miles from home base, there were several… er… “outages” that hit social media. United Airlines was the first domino. Then the New York Stock Exchange. And then The Wall Street Journal. And if that wasn’t enough, Amazon. Amazon! On Prime Day, no less! (Though one blog I read suggests that maybe Amazon had this event “planned” as a performance test in production.)
So, now that I’m back, I think that it might be a good time to do a follow-up to all my previous “testing in production” posts.
The Four Most Common Performance Testing Categories
But before we get into best practices, let’s look at the most common bottlenecks that SOASTA typically has found over the years, as well as the types of testing that SOASTA typically has undertaken over the years that lead to these bottlenecks.
Over the past seven years, SOASTA has been involved in more than 10 million performance tests, which our customers have come to us to help them solve for various reasons (e.g. lack of expertise, testing at scale in production, time pressure, or simply the inability to find a bottleneck with their existing — and most likely dated — toolset).
These millions of tests generally fall into four main categories:
New site launch and ongoing testing are pretty self-explanatory. Marketing programs typically include a promotion or a new product release or launch, while an event could be anything from holiday readiness to the Olympics to a breaking news story.
So What Are the Most Common Problems in Web Performance Testing Today?
A few years ago, application code, database issues, configuration settings (my favorite: thread and connection pool settings), and, especially, load balancers were the top culprits.
So, without further ado, here are SOASTA’s most common findings in testing web and mobile using CloudTest, which we’ve experienced since the beginning of 2014:
Some tests are designed to validate that fixes for previously uncovered bottlenecks have had the desired outcome. For that matter, not all tests are intended to find a specific stress point. So it is not surprising to see that the largest slice is “test goals reached before bottleneck found”. It is those validation tests that help performance engineers sleep at night, and it’s the other 70% of the tests where SOASTA finds the issues that then leads to that peaceful night of sleep.
Application and web servers are clearly the top two contributors to poor performance — no surprise there. It is often configuration settings, simply not having enough infrastructure to support the intended load, or a poorly designed architecture.
That’s followed by the database, which may include issues around locking and contention, missing indexes, inefficient queries, memory management, connection management, or unmanaged growth of data. Or as with the application and web servers, it may simply be insufficient resources.
“Other” includes a range of less commonly found bottlenecks, including issues with third party services, content delivery networks (CDNs), shared environments, and firewalls.
Some More “Gotchas” That You Just Can’t Test In a QA Lab
These will come into play when you move into production:
- Batch jobs that are not present in the lab (log rotations, backups, etc.) or the impact of other online systems impacting performance
- Load balancer performance issues, such as misconfigured algorithm settings
- Bandwidth constraints
- Latency between systems inside and outside of application bubbles
- Network configuration problems such as 100 MB settings instead of 1 GB on switches and routing problems
- Data pipe not configured as burstable to handle spikes in traffic
- Radically different performance depending on database sizes
- Misconfigured application server and web servers (old reliable: thread and connection pool settings)
- CDN not configured to serve up new content
Which now leads us to best practices for testing in production, highlighting things that you just CANNOT DO in a testing or QA lab environment.
Five Key Best Practices for Performance Tests That Include CDN Assets
When a retail company tests in production, it can also fully test the caching and loading capabilities of its content delivery network (CDN) provider. This is vital to understanding the true performance of a production environment.
(In case anyone reading this needs a refresher, the primary purpose of a CDN is to reduce the number of times content has to be requested from the origin servers by delivering certain content from strategically placed servers within the content delivery network.)
SOASTA has worked with Akamai to develop a set of best practices for tests including CDN assets. Here are some of the highlights, with some being more obvious than others:
- If you do not have a good handle on your real user traffic, test load generation should be distributed across all available load server regions, as close to even as possible (depending upon the nature of the test). This is required so as to be representative of real traffic distribution and to measure performance from a variety of locations, making those measurements more statistically significant. However, if you are using a real user measurement solution--for instance, mPulse by SOASTA--and you have a firm handle on where your users are originating from, then you can tailor your test load generation around various real user scenarios. As an example, say you are a top-100 retailer based in the southeastern USA with no brick-and-mortar stores west of the Mississippi River and your real user data shows that 90% of your eCommerce traffic comes from the Southeastern USA. In this example, you’d certainly be able to tailor a more suitable test load generation scenario to provide optimal distribution of load during a full test that more accurately reflects your user base.
- Load testing should occur between 11 PM and 5 AM ET. (Assuming this is a North American customer.)
- User agent must include a string that identifies the load test vendor as “SOASTA”. This is so that Akamai can track the load traffic more efficiently and enable additional logging, which comes in handy when troubleshooting if issues are discovered during the load test.
- The above bullet then implies the most important process piece of the load test: Notification of Akamai or the CDN vendor by whoever is running the test (SOASTA or the customer) that there is a load test being scheduled. This process may take some time, if this is the first time that you are testing in production. The CDN provider typically needs some time to set up for the test. For example, Akamai has the ability to segment and log SOASTA traffic. (Not only for technical reasons, but also so that the customer does not get billed by Akamai for all the additional content delivered and for the potential for bursting/superbursting.) The set-up and notification process is just as important as the test itself. SOASTA has a deep set of best practices just for the “process” piece of a load test with Akamai/CDN.
- Load test ramp up should not occur faster than 0-full in 15 minutes, assuming a linear growth mode. During the ramp up, monitoring with your real-time analytics capability keeping an eye on system performance is a best practice discussed in one of my earlier blogs, but noted again here.
The vast majority of performance test labs do not have a CDN as part of their infrastructure. You can only test the CDN performance impact by testing in production. Having CDN caching enabled influences the testing in terms of the number of requests that reach origin (main site) and the response times depending on from where the page is served — CDN or origin.
Don’t Forget Your Third-Party Service Providers!
Many ecommerce sites also use third-party providers to enhance their overall site content. Just as with your CDN provider, it is vital to involve those third-party providers that might have an impact on performance when the strategy is being formulated.
On the other hand, you would not normally include domains such as Google Analytics or Omniture metrics as part of the test. They do not want to be surprised by a test or have it bring down their service or site with fake transactions.
Involving third-party providers early, just like the CDN example above, helps ensure their support for your test. After all, SOASTA CloudTest has customers the ability to analyze performance of individual third-party provider content and provide that information to the third-party provider. Talk about WIN-WIN!
Takeaway: You Need a Solid Testing Process
As I mentioned in my earlier posts on testing in production (such as this one), a performance testing solution must have certain key features that will ensure your success, such as real-time analytics, a good “kill switch," etc.
But just as important is a good PROCESS for testing in production. A process that includes involving your CDN provider and working with third-party providers to enable you and your team to be able to execute the most realistic performance tests as possible from a technical and environmental perspective, as you have gleaned from your real user measurement metrics.