It was great speaking with Manoj Chaudhary, CTO and V.P. of Engineering at Loggly, about the Amazon Web Services S3 outage on February 28 that crippled portions of the internet for up to five hours for DZone, Expedia, Medium, Slack, and other major sites.
While Loggly is completely cloud-based, it runs on a hybrid cloud with its own data center, as well as on AWS. Loggly mines log data in real-time and reveals what matters to write better quality code and deliver a great user experience (UX).
All Loggly logs were available during the S3 outage. The volume of logs for some customers was up three to four times more than average daily volume during the outage, ranging from 10 Mbps to 40 to 50 Mbps.
This indicates the need for companies, regardless of their log provider, to ensure that they have the necessary room in compute and storage to handle the increase in data during these spikes.
After three hours, S3 began operating normally and the load gradually decreased as servers came back online. By the fifth hour, everything was back to normal.
Loggly earned a great deal of trust with their customers as a result of the outage since they saw that Loggly was not affected during the outage and was able to maintain a view into what was happening across the web.
While redundancy is expensive, it is part of Loggly's business model. Should you invest in redundancy or in providers that already have the redundancy to protect your data and reduce your downtime?