Mike Neil, of the Windows Azure blog, recently published a detailed discussion of the recent Azure Storage disruption that affected 1.8%of the Windows Azure storage accounts. Here are a few of the points he covers:
First, within the single storage stamp affected, some storage nodes, brought back (over a period of time) into production after being out for repair, did not have node protection turned on . . . Second, our monitoring system for detecting configuration errors associated with bringing storage nodes back in from repair had a defect which resulted in failure of alarm and escalation . . . Finally, on December 28th at 7:09am PST, a transition to a new primary node was initiated for the Fabric Controller of this storage stamp. . .
You can read the rest of the story by heading over to the original post on the Windows Azure blog.