Recently the world went wild when Amazon Web Services suffered an extended service outage. I’m not going to make a song and dance about AWS’ woes – suffice it to say that every provider, Cloud or otherwise, has outages. I will say that with Cloud Computing outages are more obvious than with traditional on-premise infrastructure. I will also say that on a net basis, Cloud providers are more likely to have better availability and uptime than traditional providers. Rather I’d like to reflect upon outages generally, and see what we can learn from them.
Looking at the bigger issues, the outage reminded me of a roundtable that I took part in just over a year ago. I was joined by a number of Cloud thought leaders, amongst them enStratus co-founder George Reese and Bechtel Cloud Architect, Christian Reilly. Despite the particular event we were discussing being over a year ago, the roundtable is well worth revisiting and listening to for a summary of issues relating to outages, and some best practices to avoid being dragged down in a post-outage flow on – feel free to have a listen here.
When talking about outages generally I’m reminded of a post I wrote after last year’s AWS outage, I was reflecting on the naysayers who use any outage to pronounce the end of Cloud – last year it was the turn of NetworkWorld who claimed that the “Amazon outage set Cloud Computing back years”. As I said then;
Yes the AWS event means people will think long and hard about their architecture. Yes, some enterprises that were toying with the idea of public cloud might pull back for awhile. Yes private cloud providers will use the event ad infinitum to justify private versus public but let’s be a little realistic, it doesn’t spell the end of the cloud.
So let’s instead focus on the learnings from an outage. What are the components and solutions needed to build a service that would avoid issues were an outage like the one we saw recently to occur? As I stated in my post from last year – smart organizations will learn from this and other outages and look to the following;
All Cloud vendors are quick to point out just how reliable their data centers are with their redundant communication channels, power supply structures and the like. Any application running on the clouds needs to consider the same issues – it is unrealistic to rely completely on one single data center – a chain is only as strong as its weakest link ad by relying on one DC only the idea of multiple redundancies is rendered a fiction.
This one is a little more contentious, and difficult to effect right now. But with the advent of more open standards, Cloud users have the ability to obtain service across multiple providers. More and more third party solutions are helping with this process.
The real opportunity here is for providers that offer infrastructure-vendor agnostic orchestration and automation services. Case in point Layer7 who came out quickly with a post that explains why their own rules based cloud broker product would have avoided downstream issues from the AWS event.
Outages happen – they’re not fun but they’re often unavoidable. Smart organizations will think about ways to lessen the impacts of any outages – simply running a mile from cloud because AWS went down really misses the point.