The DevOps movement only came to my attention in the final year of writing my book, Continuous Delivery. The book, based on experiences that are described by Chris Read in his earlier guest post
in this series, describes principles and practices that are necessary
for repeatable, reliable delivery of software – in particular, keeping
software production-ready throughout its development and operational
Once I grokked DevOps I became very excited, because it gave a name to two themes that form the central plank of the book. First of all, an essential prerequisite to efficient delivery is collaboration between everybody involved – particularly developers, testers, and operations – across the whole lifecycle of a project. Second, a focus on automating every part of the delivery process – not just building and testing software, but also automating deployments, environment provisioning, infrastructure changes, and database migrations.
I quickly concluded that in any organization which had an operations department, implementing the culture and practices DevOps discusses is a necessary condition for achieving continuous delivery. However in the kind of organizations I normally see, this is not an easy task. Even with the best will in the world from the people on the ground – which is not always to be found – there are profound organizational barriers to achieving collaboration between DevOps.
So in this article I want to take it for granted that we have some idea of what DevOps entails – the previous guest blogs provide an excellent overview – and talk about organizational barriers to implementing DevOps, and how I see things moving forwards in the future. I’m going to focus on medium and large organizations, because that’s where most of my experience, and that of my colleagues, comes from. I’m also going to ignore the technical aspects of DevOps and focus on organizational ones, because they are the most intractable in these organizations.
From Projects to Products
The kind of practices DevOps talks about have most visibly been successful in startups. Notable poster children include Flickr, Facebook, Netflix and Amazon – although of course Amazon is now enormous. Indeed, as far as I can lazily ascertain, all of the chapters in Web Operations that deal primarily with DevOps are written by practitioners working at startups at the time of writing.
In his chapter, “Dev and Ops Collaboration and Cooperation,” Paul Hammond starts out by discussing why it is that most organizations have a separation at all. He mentions the different goals development and operations have (putting out features versus stability), the fact they often report up separate management chains, and their different working styles. In the first essay in the book, Theo Schlossnagle also mentions the different skill-sets and knowledge operations people must posses.
But as we all know, the dev-ops separation is actually a huge barrier when you consider the overall goal of efficiently delivering high-quality, stable services to your users. Probably the best organizational solution I have heard, which came to me from Evan Bottcher, is to move from delivering services using projects to delivering them as products.
Having product teams which last for the entire lifecycle of a service – from inception to retirement – removes the strong incentives that developers have to create unmaintainable systems, and that operations people have to resist new releases. Evan doesn’t discuss the composition of these teams, but one can imagine that they might be cross-functional, with representatives from development, testing, and operations working on them. Obviously the composition of these teams could change over the course of their life as necessary.
Why DevOps will Take Time to Cross the Chasm
However there are significant organizational forces that prevent a change in this direction. These are the same forces that create the incentives for dev and ops to behave in a way that is rational within their own system, but pathological when it comes to the wider organization. These are accounting, governance, and politics (which I’ll set aside for now).
When working in larger organizations, accounting has a surprisingly powerful effect on even small-scale decisions taken on the ground. Aside from the large lead time procurement departments often impose on purchasing decisions, the division between capital expenditure (capex) and operational expenditure (opex) can institutionalize the division between development and operations.
In general, companies strive to put more into capex, not less. Since new projects tend to be funded from the capex budget, while operation of existing services is (obviously) funded from opex, operations often gets a raw deal, because it’s easier to get funding for some new, shiny project than it is to fund restructuring a legacy system.
This leads to a catalogue of sub-optimal behavior, such as the project I heard of where the project manager put the release date for the project out as far as possible, because any changes to the system following release would have to come out of the opex budget.
The second major barrier to a more sensible way of working is governance. Frameworks such as CoBiT and ITIL are often adopted in the interests of good governance – essentially, to ensure organizations manage risk in a transparent way.
However many implementations of these frameworks reinforce the barrier between development and operations based on the spurious but easily-argued basis that part of the role of operations is to enforce controls that prevent developers from going rogue, either intentionally or unintentionally. The operations team must therefore not communicate with the development team – that might make them collaborators!
To be clear, I’m not advocating bypassing or subverting these standards and processes. Indeed one of the things I’m currently working on is documenting agile good practices that are ITIL compatible. However, there is a much more powerful and effective way to manage risk than by military analogy, and it goes back to the twin threads that permete the DevOps movement: technology and culture.
Effective configuration management – including automation of the build, deploy, test and release process, provisioning and maintenance of infrastructure, and data management – make the whole delivery process completely transparent. As any good auditor will verify, there is no better documentation than a fully automated system that is responsible for making all changes to your systems, and a version control repository that contains everything required to take you from bare metal to a working system.
Continuous deployment is also great at reducing risk, by making batch sizes smaller. It’s much simpler to audit a few lines of changes than it is to audit six months of work – nobody on the ground who has had to do this is under any illusion that it is a good process for risk management.
Back to culture, there is no more effective form of peer-review than a cross-functional, co-located team. Having everybody working together, with testers, operations people and developers all pairing, is simply the most effective way to share knowledge and discover problems early on when they are cheap to fix, and thus to build quality in to your systems. One of the many interesting articles in Making Software, “A Communal Workshop or Doors That Close?” takes a scientific, evidence-based approach which demonstrates why a communal workshop model of software development is more efficient and productive.
Separating people from each other and requiring people to fill in huge change request forms might seem to make things more transparent, but practically speaking what happens is that people don’t have the information they need to make decisions. The change management process becomes slow and Byzantine, decisions get punted up the chain because nobody wants to take responsibility, and the people who end up making the decisions are the ones with the least information.
The truth is, not only is it more efficient to trust your teams to do the right thing: it’s a more effective approach to managing risk as well. Sunlight is the best disinfectant, and it’s achieved in two ways: through better working practices such as co-location and collaboration, and through automation.
Because of the false division between projects and operations, organizations actually end up obscuring the true value of services over their life-cycle. Running software services as products would allow their overall cost – and revenue – to be measured much more easily, which would allow businesses to make better decisions about where they should invest their resources.
When the expense of running a legacy service doesn’t appear on your budget, there’s little incentive to consider that retiring or replacing it – or investing in less complex infrastructure, for example – might actually be a better decision than investing in a new, high-risk service.
However, organizations won’t change their ways unless there is something really painful that forces them to. Fortunately for the people on the ground, there is.
Many enterprises currently spend 75%-85% of their budgets operating existing services. This will become increasingly unsustainable in the face of start-ups and other agile organizations moving fast to arbitrage new business opportunities. Indeed, I expect that over the next few years continuous delivery will move from being a competitive advantage to being a prerequisite for survival.
A Better Way
In the same way as the dev-ops division prevents organizations from seeing the true value of services over their lifecycle, it also prevents them from seeing the real failure rate. Many projects fail before they go live – which is often a good thing, if it’s because the team discovers they aren’t going to deliver the expected value.
However many projects complete successfully, but then go on to become commercial failures. By the time they fail, though, the original team – and often the people who initially approved the project – have moved on to other things, and never get this feedback.
Thus it’s my contention that projects are often much riskier than organizations think they are, because people aren’t measuring risk over their entire life-cycle. I think that building on the idea of running services as products, organizations could treat them as mini-startups within an organization.
The Lean startup methodology, developed by Eric Ries, including customer development as described in Steven Blank’s Four Steps to the Epiphany, seems like it should be a good model for developing services within an enterprise. One of the benefits of this model is that it emphasises the importance of developing a revenue stream from early on in the life-cycle of products, both in order to verify the hypothesis that what is being developed is valuable, and to enable organizations to pivot rapidly in response to real customer feedback.
In a recent interview, Donald Ferguson, now CTO of CA Technologies, noted that the biggest technology mistake he ever made was the creation of WebSphere. As he noted, “Because we were IBM, we survived it, but if we’d been a start-up, we’d have gone to the wall.” Had Donald’s team followed the lean startup methodology, the IT world would have been spared WebSphere – and that could only have been a good thing.
Note: Thanks to Jim Highsmith for some feedback on the draft of this post which hopefully removed some of my mistakes.