Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DevOps Disasters

DZone's Guide to

DevOps Disasters

DevOps can speed up your software development lifecycle, but make sure you're hollowing the best practices by looking over these gaffes.

· DevOps Zone ·
Free Resource

Learn more about how CareerBuilder was able to resolve customer issues 5x faster by using Scalyr, the fastest log management tool on the market. 

DevOps is a culture that breaks down the silos between software developers, operations teams and management while automating software testing and delivery. Everything about it sounds great. In fact, thanks to DevOps, developing, testing and releasing software can be sped up and made more reliable, which is critical for businesses who are looking to survive in today’s ultra competitive marketplace.

The list of examples where DevOps works to perfection and delivers tangible results for businesses in various industries, but sometimes things break down. Here are some DevOps initiatives that went wrong on at least some level and how companies addressed the problems to prevent them from recurring.

Project Vision? What’s That?

In 2003, a couple of years before the term DevOps was even coined, IBM dived into DevOps by launching an agile software development initiative for one of its new products. They invested in agile, which is a predecessor to DevOps, but it does promote fast and flexible response to change because they wanted to hasten software delivery to their customers.

This endeavor was not successful. The problem was that the development side was in fact operating at lightning speed, but operations became a major bottleneck. So ultimately, customers did not get receive the software quicker. IBM, as part of a larger DevOps transformation, decided to automate code deployment, but even this did not make the software delivery cycle fast enough. After conducting a value chain analysis, it was determined that the biggest obstacle was not the agile methodology or delivery, but the overall development and operations environments.

IBM’s DevOps disaster was ultimately caused by a lack of vision by the very people who were putting these initiatives into place. There is no way to know which problems need to be solved if you don’t know how the work needs to be done. They were taking stabs in the dark as to what the problems could be, which were mostly vendor hype, instead of what was actually slowing them down. As soon as management better understood workflows and where the bottlenecks were occuring, they were able to make the necessary changes and get the most out of DevOps.

A Lot of Access, Little Education

In 2006, when SlideShare was a little startup with less than than 20 employees, it decided to stay ahead of the competition by launching a DevOps model which speeds up processes. Not only was their infrastructure overly complicated, but half of the development team was in San Francisco and the other half in New Dehli. The goals of this DevOps model were to to have the engineering team firing on all cylinders and spread technical knowledge as much as possible so that if someone went on vacation or left the company, the impact would be minimal.

In a DevOps environment, each contributor works and makes a contribution do various parts of the product. In fact, one of the main principles of DevOps is ownership and responsibility for work assignments, but in order to achieve this responsibility you must give developers access to parts of the infrastructure that they usually don't have access to. At SlideShare, the engineers had access to production databases and servers.

While working on a database-related project, a software engineer was trying a tool that provided the possibility to graphically explore a MySQL database. He reorganized the columns’ order within the databases using that tool so that the data would make more sense to him. However, he didn’t realize that the tool was also altering the order of the columns on the actual database in production, locking it which took down the entire SlideShare website. Since the person who was responsible didn’t know that the tool was the source of the problem, it took about 15 minutes of team effort to locate the problem.

There are two lessons that we can learn from this. First, even though DevOps does encourage everyone to have an impact on some step of the production cycle, it’s always a good idea to take a step back every time you give somebody access to something to make sure it’s really necessary and will bring the necessary value. In the example above, we saw that granting access to production data was totally useless and, in fact, dangerous. The software developer could have gotten the same value by using a staging database.

The second lesson is that developers need deeper knowledge into the working of infrastructure. Many developers have never been exposed to production infrastructure, which means that they cannot possibly know all of the subtleties and hidden rules.

Putting Tools Over People

A US government agency was launching a project to build a web application and the first thing they did was establish the necessary processes and tooling to cover planning, code and commit, build and release was done in a collaborative open-source esque environment. While the configuration and deployment was successful, you cannot implement DevOps with tools alone. People, process, and culture are equally important.

Even though the agency managed to build a platform that had modern software configuration management and used Jenkins for continuous integration where everything was deployed on a so-so elastic, scalable platform, this DevOps platform effectively supported the same old legacy practices that were in place. Developers were deferring commits, and merges, QA was never fully implemented, they were indifferent towards broken builds and production loads in production-esque environments were never tested.

As soon as the agency released the web app, it immediately experienced a disastrous and public failure since it was not regularly tested in a production environment or by actual users. Also, as soon as the problems became evident, it took them several, multi-week development cycles to fix all issues and get the site up and running.

There is no denying that DevOps offers a lot of promise to speed up the software delivery cycle, but, ultimately, it’s up to you and your team to make good on that promise by employing sound DevOps practices and having a cohesive DevOps culture.

Find out more about how Scalyr built a proprietary database that does not use text indexing for their log management tool.

Topics:
devops ,sdlc ,software development

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}