My favorite quote is:
Ops’ job is NOT to keep the site stable and fast [but]
Ops’ job is it to enable the business (this is the dev’s job too)
The business requires change
They go on by presenting the dilemma of discouraging change in the interest of stability or allowing change to happen as often as it needs to. This is where they introduce their tools and culture for lowering the risk of change.
In this post I want to share with you how we use some of the tools John and Paul mention.
- Automated infrastructure: John and Paul say: “If there is only one thing you do…” and I couldn’t agree more. Using tools like puppet, chef or one of the many they name in their presentation to automate the live cycle of your servers is a must. I used Capistrano as a basis and built my own tool on top of it, Carpet. And I cannot imagine to install or configure my servers by hand anymore. It’s so much more robust and has so much less risk involved, I simply love it.
- Shared version control: We are using separate repositories for our code and our infrastructure recipes, but both are on the same github account and all team members have full access to both. As we’re only three tech guys, everyone of us does dev and ops and knows both worlds.
- One step build and deploy: Rake and Capistrano give us the power of building and deploying our whole app with a single command. This is so much better than deploying with rsync or manually copying things around. And it enables us to do continuous integration with Hudson, which is very nice, too.
As we’re such a small team, we’re not using any IRC or IM robots. I’m planning to introduce feature flags to be able to fine tune which features are available to whom. This would give us a little more flexibility in operating the platform. Shared metrics are already underway, we’re continuously enhancing our Nagios graphs adding technical and business metrics to them. Again something I would not like to work without anymore.
Enjoy the presentation…