A while back, I wrote a post on this blog about the Tools of the new data centre. There, I argued the view that as we outsource more and more of our "data center", we're going to have less and less control over the state of our infrastructure. And so the tools that make assumptions about infrastructure state and move things from A to B are not going to cut it anymore.
Understandably, my post was greeted with disdain by Chef and Puppet masters as something written by an ignorant business manager, who's unlikely to know anything about being a developer, and doesn't know how things work in the "real" world.
Here's how we were bitten by this very same problem yesterday:
We use our idempotent configuration management (not Chef or Puppet but close enough) scripts to install Redis on servers for our customers. These scripts have been working for around 3 years on 8 cloud providers but yesterday they stopped working on DigitalOcean, causing havoc with many of our deployments.
After further investigation, it turned out that DigitalOcean updated their based images and the new images didn't include an
/opt folder anymore.
You might say: Aha! You should have done a
mkdir -p /opt in your scripts. Or perhaps used your own verified base images, instead of the latest ones from DigitalOcean... Yes you would be right saying that and we've changed what we had control over to make sure this doesn't happen again.
As much of a fan of defensive programming as I am, I can see (and have seen) enough cases of unknown, unforeseen and unpredictable cases that cause exactly these issues in real world examples. In a business environment with deadlines, you always draw a line were you think it's reasonable enough to assume something from a third party involved. This could be a directory on a disk, or availability of power and network connections. Either way, you're assuming state and working on that basis.
As the number of components and third parties involved grows, so do your assumptions and their probability of them being wrong. Any tool that's built to work based on existence of these assumptions, is going to be less and less useful as you bring in more and more external entities to the party.
I just wish we could use Cloud 66 for everything we do! Unfortunately bootstrapping is sometimes not possible when it defies the laws of physics!