DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Windows Azure Armageddon

Windows Azure Armageddon

Roger Jennings user avatar by
Roger Jennings
·
Feb. 25, 13 · Interview
Like (0)
Save
Tweet
Share
2.83K Views

Join the DZone community and get the full member experience.

Join For Free

At 12:54 PM on 2/22/2013, I received the following DOWN Alert message from Pingdom for my OakLeaf Systems Azure Table Services Sample Project demo from my Cloud Computing with the Windows Azure Platform book:

image
When I attempted to run the sample Web Role project at http://oakleaf.cloudapp.net, I received an error message for an unhandled exception stating that an expired HTTPS certificate caused the problem. Here’s the Windows Azure team’s explanation:

image

Here’s Pingdom’s UP alert at 8:28 PM last night:

image

Prior to this incident, the sample project had run in the South Central (San Antonio, TX) data center for nine months within the 99.9% availability SLA, as reported in my monthly Uptime Reports. Here’s the latest monthly uptime report data since June, 2011:

Month Year Uptime Downtime Outages Response Time
January 2013 100.00% 00:00:00 0 628 ms
December 2012 100.00% 00:00:00 0 806 ms
November 2012 100.00% 00:00:00 0 745 ms
October 2012 100.00% 00:00:00 0 686 ms
September 2012 100.00% 00:00:00 0 748 ms
August 2012 99.92% 00:35:00 2 684 ms
July 2012 100.00% 00:00:00 0 706 ms
June 2012 100.00% 00:00:00 0 712 ms
May 2012 100.00% 00:00:00 0 775 ms
April 2012 99.28% 05:10:08 12 795 ms
March 2012 99.96% 00:20:00 1 767 ms
February 2012 99.92% 00:35:00 2 729 ms
January 2012 100.00% 00:00:00 0 773 ms
December 2011 100.00% 00:00:00 0 765 ms
November 2011 99.99% 00:05:00 1 708 ms
October 2011 99.99% 00:04:59 1 720 ms
September 2011 99.99% 00:05:00 1 743 ms
August 2011 99.98% 00:09:57 2 687 ms
July 2011 100.00% 00:00:00 0 643 ms
June 2011 100.00% 00:00:00 0 696 ms

Following is the historical report for those services affected by the expired certificate:

image


image
image
image
image

imageIt’s obvious that some minor functionary in the Windows Azure bureaucracy missed an item on his or her todo list yesterday. It’s equally obvious that this is a helluva way to run a cloud service (with apologies to Peter Arno and John Luther (Casey) Jones.)

Adrian Cockcroft (@adrianco) noted that “Azure had a cert outage a year ago” in a 2/23/2013 Tweet:

image

Microsoft’s Bill Liang posted a Summary of Windows Azure Service Disruption on Feb 29th, 2012, which was caused by expiration of a “transfer certificate,” on 3/9/2013:

… So that the application secrets, like certificates, are always encrypted when transmitted over the physical or logical networks, the GA creates a “transfer certificate” when it initializes. The first step the GA takes during the setup of its connection with the HA is to pass the HA the public key version of the transfer certificate. The HA can then encrypt secrets and because only the GA has the private key, only the GA in the target VM can decrypt those secrets.

…

When the GA creates the transfer certificate, it gives it a one year validity range. It uses midnight UST of the current day as the valid-from date and one year from that date as the valid-to date. The leap day bug is that the GA calculated the valid-to date by simply taking the current date and adding one to its year. That meant that any GA that tried to create a transfer certificate on leap day set a valid-to date of February 29, 2013, an invalid date that caused the certificate creation to fail. …

azure

Published at DZone with permission of Roger Jennings. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using JSON Web Encryption (JWE)
  • Unleashing the Power of JavaScript Modules: A Beginner’s Guide
  • Using QuestDB to Collect Infrastructure Metrics
  • Public Cloud-to-Cloud Repatriation Trend

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: