2016 is nearly over, so it’s time to finalize those new year resolutions. Yours are probably about family, friends, work, and hobbies, but some companies need to have resolutions about their website or app performance.
1. June 2016: ASOS Website and App Crash After Brexit
The popular British clothing site ASOS crashed after the Brexit referendum passed and was down for more than a day. While ASOS claimed the crash was due to a power outage at a third party data center, it’s also possible that multiple shoppers flocked to the site after Brexit brought the value of the pound down and made shopping in pounds a worthy bargain for anyone who uses other currencies.
ASOS 2017 Resolution
Set up backup servers and locations for quick recovery in case of power outages. By setting up a database replication, database failover cluster, or application failover cluster, you can switch to the failover location and provide service and conduct sales while trying to fix the errors on your main server. Don’t forget to prepare a procedure in advance so DevOps and developers know what to do in such a case.
Stress test, spike test, and soak test. Test your system in advance and bring it to its limit in different ways (over a long period time, in a short time, keeping it under heavy load for a long time, etc.). Determine your bottlenecks and come up with a plan to deal with them if the system reaches them. That way, unexpected events might catch you by surprise, but you’ll know how to quickly deal with them.
2. July 2016: People Go Crazy for Pokemon GO
The real-life version of Pokemon had people searching for animated creatures on the streets, behind bus stations, and in parks. The surging popularity resulting in heavy loads caught Niantic, Pokemon Go’s creators, by surprise, and they had to pause the rollout due to users being unable to log into the game or unable to battle Pokemons.
Niantic 2017 Resolution
Track end-user performance. The Pokemon GO craze has died down, but we recommend that the next time Niantic releases an amazing app, they track the end user actions and performance and integrate it into backend testing. We recommend using WebDriver with JMeter and Selenium for that.
3. October 2016: Glastonbury Music Festival Ticket Website Crashes
Glastonbury’s music festival ticket website crashed shortly after opening up, leaving music lovers unable to connect to the site or complete their transaction. Fans were not happy, to say the least, and took to social media to express their frustration.
Glastonbury 2017 Resolution
Learn from similar use cases. Ticketfly, a ticketing company with its own data center and system, prepared for the summer concert season by simulating expected traffic spikes due to users constantly hitting the refresh button. They measured and tune the underlying system and fixed whatever needed fixing in advance.
4. November 2016: Canadian Immigration Website Crashes After U.S. Elections
The night of the results of the general elections, the Canadian Immigration website crashed due to high interest by disappointed Americans in moving to Canada.
This wasn’t the only election-related crash. In October, the Virginia Department of Elections website crashed due to high volume in voter registrations. The website broke its record for the amount of online registrations in one day.
Elections 2017 Resolution
Expect the unexpected. Even when things seem to be certain, they might turn out differently. Be prepared — investigate your system to find out what your weak points are and why, and set up alert monitoring dashboards for these critical issues.
5. Black Friday 2016: Macy’s Website and Mobile App Crash Under Heavy Loads
Macy’s, the largest U.S. department store chain, which is moving its business online, crashed under Black Friday loads. The website and app were both unavailable, resulting in unsatisfied customers ranting on social media.
Macy’s wasn’t the only website that crashed on Black Friday. Old Navy, GAME, Quidco, and more weren’t prepared for the high traffic surges and lost sales opportunities in the short and long run.
Black Friday 2017 Resolution
Plan and implement Continuous Integration methodologies. Load and performance test earlier in the process and year-round and implement them into the Continuous Integration process. This ensures code changes don’t affect user experience and you have enough time and resources to discover bottlenecks and fix errors and bugs. By using open-source tools like Jenkins and integrating them with JMeter or other performance testing tools, Black Friday can pass smoothly for everyone.
6. November 2016: Fandango Crashes Following High Demand for "Rogue 1: A Star Wars Story"
Star Wars fans who had been biting their nails in expectation of the new movie had to wait a bit longer to buy tickets. Fandango, a popular movie ticket selling website, crashed following high amounts of traffic. Buyers were directed to a waiting room where they had to wait all night if they want to secure tickets.
Fandango 2017 Resolution
Performance test for expected heavy loads. Prepare in advance for expected traffic surges by deciding on business goals, creating user scenarios simulating real-world user experience, running load tests against the production environment, and analyzing the results.
This is exactly what Movietickets.com did in preparation for "Star Wars: The Force Awakens" in October 2015. They tested in advance for 300,000+ concurrent users after starting with tests for 30,000 users and were able to cope with the ticket purchasing traffic in real time with no crashes.
By learning from 2016 and deciding what should be different in 2017, we can all prepare better for bugs and bottlenecks and avoid crashes — both personal and website or app related.