DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Data
  4. Failing to Plan is Planning to Fail

Failing to Plan is Planning to Fail

Ted Theodoropoulos user avatar by
Ted Theodoropoulos
·
Apr. 26, 11 · Interview
Like (0)
Save
Tweet
Share
6.61K Views

Join the DZone community and get the full member experience.

Join For Free
There has been quite a bit of press coverage regarding the  outage at Amazon this week.  Much of this coverage has focused on how the outage has brought down many popular web sites such as Reddit, Quora and Foursquare.  The point that seems to be getting missed here is that by failing to plan, these companies planned to fail.  Technology professionals know that data centers fail all the time.  Data center failures are a fact of life that must be planned for and dealt with.  While it’s true that Amazon did not live up to expectations, they actually did not violate their service level agreement (SLA) as Gartner pointed out.

Amazon’s SLA for EC2 is 99.95% for multi-AZ deployments. That means that you should expect that you can have about 4.5 hours of total region downtime each year without Amazon violating their SLA. Note, by the way, that this outage does not actually violate their SLA. Their SLA defines unavailability as a lack of external connectivity to EC2 instances, coupled with the inability to provision working instances. In this case, EC2 was just fine by that definition. It was EBS and RDS which weren’t, and neither of those services have SLAs.

Amazon is an infrastructure as a service (IaaS) provider which means they provide the hardware and low level software used to support Cloud based applications.  The beauty of the IaaS model is that you can design and build an application anyway you see fit based on your individual requirements.  If your application requires high availability and you choose not to address that requirement in your design, then you have introduced risk into your environment.  We call this design shortcoming “technical debt.”  I have blogged extensively on the subject which can be referenced if additional background is needed.

The principal amount of this technical debt is the cost of implementing the required redundancy.  The interest is the cost of the additional risk associated with not having appropriate levels of redundancy.  There are several ways to assign dollars to risk but none of them are perfect.  The most straightforward approach is to estimate the cost of a failure and then multiply by the probability it will occur.  Let’s say Foursquare estimates that the cost of their website going down for 24 hours is one million dollars.  Based on the optimal design and implementation of the application, the probability of such an outage is 0.5 percent a year.  However, because Foursquare took some design shortcuts the probability increased to 4 percent a year.  The interest on this technical debt can be calculated as follows.

Incremental Risk: 4%-0.5% = 3.5%
Cost of Failure: $1,000,000
Interest: $35,000
Should Foursquare have implemented the redundancy needed to achieve required uptime?  The answer depends on the principal of the debt.  If the cost of providing redundancy is $5,000 then it would be a very easy decision.  If you invest $5,000 and eliminate $35,000 of risk you’re achieving an ROI of 700%.  It would be a different story if the redundancy cost is $100,000.  That would provide an ROI of 35% which might not be a good investment.  Not many investors would sign up to get back 35 cents for every dollar they invest.

The fact is that Cloud redundancy is cheap and Foursquare would have achieved an astronomical ROI by implementing it.  The culprit in this outage is not the Cloud.  It is technical debt.  Blaming the Cloud for these outages would be like blaming your hard drive manufacturer for lost data when it fails.  Everyone knows hard drives fail and you should always have a backup.  If your hard drive happened to be backed up by one of the many tools that leverage Amazon’s S3 service that’s not technical debt.  That’s just bad luck!

From http://blog.acrowire.com/cloud-computing/failing-to-plan-is-planning-to-fail/

Data (computing) planning

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • UUID: Coordination-Free Unique Keys
  • Software Maintenance Models
  • Readability in the Test: Exploring the JUnitParams
  • Distributed Tracing: A Full Guide

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: