How DevOps Teams Prepared for Cyber Monday
How DevOps Teams Prepared for Cyber Monday
In this article, the author describes a couple of mindsets and practices that ideal DevOps teams had in place during Cyber Monday.
Join the DZone community and get the full member experience.Join For Free
Easily enforce open source policies in real time and reduce MTTRs from six weeks to six seconds with the Sonatype Nexus Platform. See for yourself - Free Vulnerability Scanner.
As we headed into the Thanksgiving weekend with thoughts of relaxing with family and friends, there was a group of folks who would still be working or on call the whole time. The Dev and Operations teams of major online stores were preparing for this period for many months. Cyber Monday is the largest shopping day of not only the Thanksgiving weekend but the entire year. What’s more, according to Adobe Digital Insights (ADI), it is anticipated to be the largest online shopping day in history. So, no pressure then.
It’s a time when every minute of downtime costs. According to the Aberdeen Group, large companies lose an average of$686,250 per hour of downtime. While most if not all major online retailers were unlikely to experience widespread outages, the more likely scenario was a slowly responding site. Under 100ms is perceived as reacting instantly, while a 100ms to 300ms delay is perceptible. 40% of mobile visitors will abandon a site after a three-second delay, so speed of response or perceived speed are critical in determining whether an online purchase takes place or is abandoned out of frustration. Perceived speed can be addressed by the web team using techniques such as progress bars or content sliding in and out to distract the visitor for the second or so needed for a site to update.
Actual speed of response is much harder to address. In the period leading up to Cyber Monday, enterprises that adopted DevOps (ideally) had both a mindset and a set of practices that drove their preparation for such an important period in the retail calendar. These were likely to include the following.
Collaboration Already in Place
The Dev, Ops, and test teams were already engaged with each other for some time and were objectivized on optimizing the customer experience above individual and departmental goals.
A Continuous Delivery Model
A high velocity of small, incremental releases were deployed with little if any negative impact, supported by automated configuration, deployment, and release management technologies, and processes.
Knowledge captured from the same period in 2015: Estimates suggest that 2016 will see an 11% increase in last year’s trading, but there will undoubtedly be regional, device, and time variations. Metrics captured during the previous year’s Cyber Monday will not be foolproof in indicating likely site demands this year, but they will still provide a good starting point.
End-to-End Visibility of Business Transactions
Teams will have a full understanding of the software functions and components that make up the purchase process from the initial page view all the way through to database calls and shipping confirmation notifications.
Synthetic and Real User Monitoring
By combining an understanding of actual user engagement with the site and how it will likely behave under heavy loads at different times from different locations, potential vulnerabilities and bottlenecks could be identified and remediated ahead of time.
Understanding of Third-Party Dependencies
When online stores have a major external dependency such as a payment platform, fulfillment agent, or loyalty card provider, latency that originated from these must also be identified and addressed.
Scaling Up Beforehand, With Fail-Overs Available
Performance engineering teams and site reliability engineers took particular responsibility for ensuring that the site was robust enough to withstand vast traffic volumes from multiple logins. This includes topics such as net new account creation and database access speeds and viewing peak traffic rather than average traffic is the primary consideration.
Full View of the User Experience
Shoppers accessed the site from notebooks, tablets, and smartphones from a variety of manufacturers in different locations and using a number of network providers, each with their own bandwidth speeds. DevOps teams had data on how the site will appear to each of these groups and variances that need to investigated.
Recent Technology Adoptions Are Not a Black Box
It’s been an amazing last 18 months for concepts such as microservices and technologies (like Docker) as they move up the maturity curve and become a staple part of many an enterprise’s stack. However, it’s essential that granular insights into how microservices are performing should be available, such as automatic discovery of entry and exit points of microservice as service endpoints. Equally, DevOps teams should also be able to correlate Docker metrics with the metrics from the applications running in the container.
So, once this intensive period from Black Friday started, what were optimized DevOps teams doing?
Should any health alert have triggered a status switch from green to yellow, there was a plan of pre-agreed corrective measures to address delays wherever they exist. These delays should not have kicked off debate as to whose team was or was not responsible and how to address the pain.
Laser Focus on Where an Issue Resides
Sometimes, the cause of response time delays sits in one tiny part of the overall stack. Using the right monitoring solutions, the best DevOps teams knew exactly how to pinpoint the bottleneck and fix it ahead of the customer sensing any slowdown in site responsiveness. If the full end-to-end business transaction view was obtained, enterprises could identify where online visitors were at any moment in time and if they were at risk of abandoning a site due to poor responsiveness.
Dynamically Reviewed Performance
While Cyber Monday was most likely to see the peak volume of consumer traffic over the Thanksgiving weekend, Black Friday and the day afterward also witnessed high volumes, giving DevOps teams insights into performance and potential issues ahead of time. Perhaps it’s better to think of it as a particularly heavy traffic volume period with a spike at the end of it than a big bang launch. Rather than setting fixed parameters, dynamic baselining of how servers, networks, databases, and so forth are performing provides a more insightful picture of what is working well, and what isn’t.
IT-related metrics are great, but in an ideal world, the DevOps team should also be able to share KPIs that reflect question such as:
- What is the ratio of sales between existing and new customers?
- Are existing customer details being populated when they log in or is there a database access bottleneck?
- Are new customers onboarded without delay?
- Which parts of the site are generating greatest revenue (e.g. electrical vs kitchenware)?
- Is there a delay in the final stage of the purchase cycle?
- Where do visitors sit in the purchase cycle at any given time?
These questions tie back to what we at AppDynamics call Mean Time to Business Awareness (MTBA) — how quickly can essential business-relevant site performance data reach those who need to know and can make key investment and strategic decisions with this information?
Captured Essential Metrics in Preparation for 2017
When Cyber Monday ends, it can be easy to forget to analyze and store the major performance behaviors that occurred. Investment here will pay off in 12 month’s time, as it will help create a foundation for expected site traffic.
Failed to Prepare, Prepared to Fail
Yes, it’s a well-worn phrase, but it’s especially apt when applied to Cyber Monday. DevOps teams who have done their homework will be attentive during this time, but they will also feel confident that despite application complexity, they know the online experience inside and out, where potential risks may occur, and have an agreed response should an issue arise.
If leading retailers got Cyber Monday right, they laid the foundation for an ongoing customer relationship based on the ability to deliver a consistent, quality experience. Failure to prepare could have a highly detrimental impact through customer attrition, lost revenue opportunities, brand reputation, and social media naming and shaming.
Published at DZone with permission of Justin Vaughan-Brown . See the original article here.
Opinions expressed by DZone contributors are their own.