As you may know, I recently moved my blog to Windows Azure which had its own hassles to convert my data and update various internal references in my blog posts. This was actually the first step in test-driving Windows Azure in order to start and host a couple of critical services in there in the coming months, so I kept playing around with Windows Azure and tried to test a few things on it. Yesterday evening, while I had used some prescriptions and was very dizzy, I created a test database and when I was going to drop this database, I made a very bad mistake and dropped the whole database server hosting my blog’s database along with other test databases.
I usually have a backup of all my data and used to store backups for my blog database on a regular basis in the past, but since I moved to Windows Azure, there isn’t a built-in mechanism to achieve this and I have to use some tools like Windows Azure Migration Wizard or SQL Azure Backup by Red Gate. Unfortunately, I was busy enough to be distracted from going over this manual process in the past couple of weeks and back up my data.
Losing my data, I created a support ticket on Microsoft Azure Support website and in a few minutes a support guru (apparently, from a company in India that Microsoft has outsourced its support to) called me asking about more details while telling me that there is a chance that they can’t restore my data. After some email conversations, answering some questions, and another phone call, they had to switch their shifts so a new person follows up with me. Answering some questions via email, I didn’t receive any response till this morning when I received another phone call telling me that they’ve started working on the issue and I didn’t hear anything new since then up till a few minutes ago when I was informed by phone and email that they’re going to work on this issue but it’s an unsupported scenario. The recovery process can be initiated in business hours only and takes 6-8 hours to complete. And the bottom line is that there is no guarantee to recover my data in the end. It’s now over 24 hours since my data loss has occurred and I opened the support ticket. While the initial response time was quick, I haven’t seen much progress in action yet and it seems that there is a bureaucracy going on for the support process at Microsoft.
Putting everything else aside, I’ve opened a support ticket and the least expectation is to get a certain response in a reasonable time. Unfortunately, I haven’t received a clear result back from Microsoft support to know whether I can be hopeful to get my data back or not. This is a catastrophic failure for a cloud hosting solution. I’m lucky that I’m not hosting a business on Windows Azure, otherwise I could have a huge money loss in the past 24 hours.
Luckily, at the same time that the data loss occurred, I quickly went to Google cache and retrieved the content of the blog posts that I had published in the past couple of weeks then added them to the local backup database that I had kept since before and redeployed the site to be live and running. In this process I lost some comments, trackbacks, and post view statistics that are the least things to worry about when you have a bigger data loss. However, this was only possible for a blog like mine and I was lucky that I didn’t have many posts published in the past two weeks and I was able to find a recent backup.
Although this whole problem isn’t really big (especially since I could recover my posts), this unfortunate event had some lessons for me. First, I should avoid working when I’m sick and only relax to get better. There is no point in working and doing something that makes things worse. Second, I should be extra careful about backups whether there is a good mechanism provided or not even though as I write below, it’s reasonable to blame Microsoft for this to some extent. Third, it’s not often a very good idea to be a pioneer in using newest technologies because not only this time, but also many times before, I’ve experienced the same problems and wasted a lot of time on wrestling with an unstable technology. Sometimes I feel that I’m goofy enough that Alpha, Beta, and Release Candidate builds are specifically designed for me to waste my time and energy with!
Fourth, I should revisit my plans to start and host some services with critical requirements on Windows Azure. Putting this data loss aside, in the past couple of weeks I realized that Windows Azure is not still stable enough for the purposes of a critical system. Human errors are always there but the least expectation for a big company like Microsoft is to be able to have a backup of data for its customers after charging them for expensive cloud hosting solutions. Assume that I had a very big site running on Windows Azure with a huge database. Could I write a software to download my data each and every minute? Obviously no, and it is clear that a backup/restore option is one of the basic expectations for a customer. I think that one of the key points about cloud hosting is to let regular users have benefits of an enterprise service without spending much money and effort to manage the hardware. If I need to run a third server and pay for software to back my data up on a regular basis, I haven’t gained much by going into the cloud.
All in all, I’ll be waiting for Microsoft to get back to me with a clear response on the status of my database server and if they can recover it, I will synchronize my data with the current database to redeploy it. I hope that this happens because I had applied some manual modifications in data to resolve some issues that I will need to redo if I lose my data.