During Amazon's much-hyped Prime Day last month, shoppers experienced difficulty at the checkout. The issue appeared when shoppers attempted to add an item to their shopping cart, both on mobile and desktop versions. Reportedly, last year’s Prime Day was the second-largest sales day for Amazon in 2015, with total sales nearly 3.5-percent higher than Black Friday 2015. Not a good day for any type of malfunction on the web site.
Some customers are reporting difficulty with checkout. We’re working to resolve this issue quickly.
— Amazon (@amazon) July 12, 2016
Even outside of big sales events and shopping seasons, e-commerce websites need to be aware of how crucial flawless functionality and site performance is for their business. A one-second page delay could potentially cost an average e-commerce site $2.5 million in lost sales every year. Today’s online shoppers simply don’t tolerate slow, sluggish, or even dysfunctional web pages, and tend to quickly leave such sites in droves. They take their money elsewhere and their frustration to social media. Not getting the item you decided to buy just doesn’t go well with the desire for instant gratification. Missing out on a good deal during a sales event easily turns customer disappointment into real anger:
— charles nestler (@charlesnestler) July 12, 2016
Outages or malfunctioning features might turn the biggest revenue-pumping sales event into the ugliest nightmare with lost business and damage to the brand’s reputation.
E-commerce businesses need to prepared for this in two ways:
- They need to ensure their sites operate fast and smoothly under normal circumstances. This requires the ability to tune and optimize performance.
- They need to be well-prepared for abnormal circumstances when the site runs into trouble and features start malfunctioning.
In that case, the ability to quickly analyze the underlying technical problem and track down and eliminate the root cause is absolutely crucial. Log data is the most important source of information for that analysis. It allows you to tune and optimize your site, and to debug it when things go wrong.
All software keeps a log of what’s going on: events, transactions, errors, usage statistics, performance metrics, network traffic, security issues, and much more. Think of it like a ship’s logbook or the flight data recorder of a plane.
Each of the many software components that make up an e-commerce website basically writes its own logbook, allowing for efficient troubleshooting or gathering business intelligence data. This data is like a pot of gold when it comes to troubleshooting.
At Loggly we regularly demo how you can debug a defect on an e-commerce web site that results in almost exactly the same error that Amazon’s customers saw last week. Watch this video to see how adding an item to the cart of our demo store fails causing the web server to report an “Internal Server Error”, and then see how you can use log data to debug the error.
Having that log data is one thing, efficiently managing it and being able to analyze it is another. This is where many — in particular smaller companies, but even large ones — are falling short. Log data from all the different components is typically distributed all over your systems and network, so accessing it and getting a cohesive view of all of it is a challenge. Even if you accomplish that, you will need sophisticated analytics in order to understand the data and draw meaningful conclusions. We are talking about machine data here, and about really, really large amounts of data: even a small e-commerce site typically generates many gigabytes of log data every single day. In the case of an error situation, that volume can easily multiply.
It is crucial that the operators of such sites always have the log data at hand, and that all experts on staff, including developers, can easily look at it. If your site is down, you’re losing revenue, and your company is facing a sh*tstorm on Twitter. This is a really bad time to start thinking about how to get to that one log file on Server XYZ — espcially if part of the outage caused Server XYZ to lose network connection, making it impossible to be reached in your greatest time of need.