How Retailers Can Prevent Downtime More Effectively
How Retailers Can Prevent Downtime More Effectively
The words ''504 gateway time out'' are the last four words any online retailer ever wants its shoppers to see. Read on for some tips on how to prevent such a disaster.
Join the DZone community and get the full member experience.Join For Free
xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read the Monitoring in a Connected Enterprise whitepaper and learn about 3 tools for resolving incidents quickly.
The words "504 gateway time out" are pretty much the last four words any online retailer would ever want its shoppers to see, let alone on the biggest cyber shopping weekend of the year.
With online sales of a record-breaking $3.34 billion on Black Friday 2016, according to Adobe Digital Insights, it’s a real shame if your site went down.
Unfortunately, big retail companies like Macy’s, Inc., Victoria’s Secret’s, Express, Inc., and Pier 1 Imports failed to heed the lessons of Black Fridays past and experienced major shopping jams, as their sites just couldn’t handle the heavy traffic. It’s happened before at Target, Neiman Marcus, and Best Buy, costing them customer loyalty and a significant amount of revenue.
This year, Macy’s website suffered an extended Black Friday disruption, directing customers to a page citing “heavier traffic than normal” or “we’re getting a makeover.” Mmm, perhaps not quite the best timing for a makeover.
When your site’s down, your customers lose their confidence in you and things get personal. They get frustrated, angry and need to vent. Publicly. I mean, you did promise them a fabulous, hassle-free online shopping experience, but failed to deliver. They mayeventually forgive you, but you’ll have to work hard to earn back their trust, probably putting in more work than if you had originally planned ahead to avoid the situation. You’ll probably also lose a few potential customers in the process, and who knows where the social media winds will blow, right?
Plus, your stocks may drop. According to a Bloomberg report, the Macy’s site couldn’t keep up with the heavy traffic flowing, leading to a reported decline of 1.71% during active trading on Friday. Not only were potential customers disappointed, causing them to leave the site or hit the refresh button over and over, but this downtime issue actually hurt the $13.52 billion retailer’s stocks. That’s a straight-forward lose-lose situation right there.
What should you do to avoid this?
100% Uptime Is Not Impossible
In our internal analysis of the holiday season, we found that retailers using XAP achieved success in surpassing key metrics for online retailing app performance. Notable metrics from our 2016 internal analysis include the following:
A Story From the War Room
We sent our field engineer to work on-site with one of our top eCommerce customers. Here are some useful insights on how a preemptive support strategy and a short feedback loop works.
This retailer used XAP to provide access to its catalog, inventory data, and tax to achieve a zero-downtime holiday season for the second year in a row. As a result, this company delivered a fantastic customer experience with a page load time of 3.6 seconds on average, reaching 3.88 at its highest peak of holiday traffic and generating a 100% increase of 2016 holiday sales.
It’s notable to mention that this retailer did not experience any system performance issues.
Identifying and Mitigating Key Risks
Rather than reacting after a failure occurred, prevent failure in the first place.
Most failures are the result of misconfiguration or capacity planning guesswork.
Planning and proactive tuning with continuous system monitoring produce predictable improvements at scale, eliminating risks from incorrect provisioning.
eCommerce applications are complex and built from many subsystems. In many cases, an eCommerce organization does not have the expert skill-set in each of the subsystems. Having an expert in the room helps to bridge this gap and builds the capabilities of business operations.
When product-related issues are identified, we were able to provide the fastest path to protect the business and address concerns in a timely fashion.
Using In-Memory Computing
You should use in-memory computing to buffer peak load access to shared data resources.
Typical eCommerce systems have shared data resources for managing inventory, orders, and catalog information. Putting the shared data resources in-memory provides faster and more efficient (parallel) access to this shared data.
Data is mirrored back into the database in batches. In this way, peak load transactions are buffered so that database traffic does not crash the database back-end.
The in-memory computing grid acts as a system of record. Failure in the underlying database can be saved without affecting the online users while the database is restored to a working state.
Using a combination of in-memory and SSD allows very large in-memory data sets to be stored at a reasonable cost, while still ensuring fast recovery during failure.
Remembering That Self-Healing Systems Recover From Failure in Real Time
Failures are inevitable. Keeping a backup copy in memory enables zero-downtime systems to service user traffic without interruption, even if something does go wrong.
Systems provisioned for failure handle failure by design.
Automatic fail-over and provisioning eliminate the need to overprovision (costly) resources in case of failure. Traditionally, it’s common for retailers to provision resources for holiday season that are five times the capacity of non-holiday traffic infrastructure.
Peak load performance often tends to stretch any system behavior in areas that are least expected and thus are often hard to handle. Quite often, peak loads lead to unexpected downtime.
There are many cases in which this sort of peak load performance is known in advance, as is the case with Black Friday and Cyber Monday. Still, many eCommerce sites continue to experience downtime or slowness during such events that lead to huge loss of revenue and reputation.
For the past few years, we (together with our customers) have taken a preemptive approach by putting an engineer on-site to escort the customer team during the event itself. This resulted in huge success, leading to 100% uptime. We learn so much from the experience; the customer learned even better how to operate our product and what to look for to ensure that the system is running properly. We learned much about how the customer is using our product and were able to shorten the feedback loop between the customer and our product and engineering team.
Let’s agree that next year, the last four words you want your customer to see are “Thank you for shopping!” That definitely has a better ring to it.
Published at DZone with permission of Dana Meschiany , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.