Over a million developers have joined DZone.

Crisis Management Under the Microscope

· Cloud Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database on AWS. Try it now! Brought to you in partnership with MongoDB.

The news this past week has brought endless images of devastation from all across the metropolitan region.

More than once, in conversations about the recovery efforts, I’ve commented, “That’s similar to what I do.” Web operations is every bit about disaster recovery and crisis management in the datacenter. If you saw Con Edison down in the trenches, you might not know how that power gets to your building, or what all those pipes down there do, but you know when it’s out. You know when something is out of order.

That’s why datacenter operations can learn so much about crisis management from the handling of Hurricane Sandy.

1. Run Fire Drills

Nothing can substitute for real world testing. Run your application through it’s paces, pull the plugs, pull the power. You need to know what’s going to go wrong before it happens. Put your application on life support, and see how it handles. Failover to backup servers, restore the entire application stack and components from backups.

2. Let the Pros Handle Cleanup

This week Fred Wilson blogged about a small data room his family managed for their personal photos, videos, music and so forth. He ruminated on what would have happened to that home datacenter, were he living there today when Sandy struck.

It’s a story many of us can relate to, and points to obvious advantages of moving to the cloud. Handing things over to the pros means basic best practices will be followed. EBS storage, for example is redundant, so a single harddrive failure won’t take you out. What’s more, S3 offers geographically distributed redundant copies of your data.

Web Operations teams do what Con Edison does, but for the interwebs. We drill down into the bowels of our digital city, find the wires that are crossed, and repair them. Crisis management rules the day. I can admire how quickly they’ve brought NYC back up and running after the wrath of storm Sandy.

3. Have a few different backup plans

Watching New Yorkers find alternate means of transportation into the city has been nothing short of inspirational. Trains not running? A bus services takes it’s place. L trains not crossing the river? A huge stream of bikes takes to the williamsburg bridge to get workers to where they need to go.

Deploying on Amazon can be a great cloud option, but consider using multiple cloud providers to give you even more redundancy. Don’t put all your eggs in one basket.

4. Keep Open Lines of Communication

While recovery continued apace, city dwellers below 34th street looked to text messages, and old school radios to get news and updates. When would power be restored? Does my building use gas or steam to heat? Why are certain streets coming back online, while others remain dark?

During an emergency like this one, it becomes obvious how important lines of communication are. So to in datacenter crisis management, key people from business units, operations teams, and dev all must coordinate. Orchestrating that is and art all by itself.

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:

Published at DZone with permission of Sean Hull. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}