In the epic battle between software developers and IT operations, most observers will favor the dev side of the house. As the poster children for modern software, developers get the glory … and most of the money and perks, too.
But at least one influential observer isn’t afraid to show his appreciation for the operations side of the equation: DevOps enthusiast and researcher Gene Kim spoke at a series of New Relic roadshows around the country in recent weeks, and made his position clear: “Over the last sixteen years, having seen how some the largest Internet companies work … I came to the conclusion that many in DevOps have come to: one of the biggest differentiators to great time to market is great IT operations.”
“I love ops,” Gene added. “Why? Because when something goes really wrong, like when the code totally blows up in production, the people who will be first on the scene and often the last to leave is ops. Memory leak? No problem, we’ll do hourly reboots until dev can figure how to effect a fix.”
It happens all the time, said Gene, who is co-author of The Phoenix Project and founder of Tripwire. There’s a site outage, and the developers are adamant that they didn’t make the change. They’re saying, “It must be the security guys, they’re always causing outages.”
“Or, there are 50 systems behind the load balancer, and six systems are acting funny. What’s different, and who made them different? Or, every server is like a snowflake, each having its own personality.”
What makes a bad day for ops? They do six weeks of testing, but deployment still fails. Why? Because the QA environment doesn’t match production. Or there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failure. Or changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in production.
Once again, it’s ops who ends up fixing things. “Who’s introducing the variance? Gene asked. “Well, it’s often the developers. There’s an old joke: show me a developer who isn’t causing an outage, I’ll show you one who is on vacation. Funny, right? Because it matches so many of our own common experiences.”
Of course, it’s not actually the developers’ fault. All too often, their primary measurement is to deploy features quickly to get to market. But if you want to make things better for ops, and the entire organization as a whole, Gene said, start by making sure environments are available when they need them, and that they’re configured correctly the first time. And, of course, don’t forget to document all the changes so that they can be replicated downstream and upstream.
The real way to make peace between dev and ops? It’s simple, Gene claimed: “As John Allspaw and Paul Hammond said in their famous 2009 presentation, ‘We need ops who think like devs and we need devs who think like ops.’”