Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Common DevOps Fails

DZone's Guide to

Common DevOps Fails

DevOps teams fail through unwillingness to ''fail fast and iterate,'' lack of executive-level support, and lack of alignment between team members.

· DevOps Zone ·
Free Resource

Learn more about how CareerBuilder was able to resolve customer issues 5x faster by using Scalyr, the fastest log management tool on the market. 

To understand the current and future state of DevOps, we spoke to 40 IT executives from 37 organizations. We asked them, "What are the most common DevOps fails?" Here’s what they said was typically missing:

Fail Fast Mentality

  • People who are successful are people who can talk about failure in the right way. Talk about what you are learning from the data. Apply a method to identify the failure. Fail to recognize that part of DevOps is to divide the team into eight to 12 members with developers dedicated to one project. You need everyone on the same team – if you fail, you fail together. Start with boot camps with all the developers. Short camps about how to manage people, measure success, assign work. Focus on tools and automation and don’t think about what you are trying to accomplish, what value trying to bring and then map KPIs to the value proposition.
  • People and technical side. If the customer starts with “fail fast and iterate” it’s pretty easy. Honest and healthy retrospective versus coming from a blame model. Need a blame-free environment to successfully implement. On the technical side, we’re seeing fewer failures. The fails typically are functional. You get drift in environments to find that a quarter of incidents are the result of a divergence in staging and production environments. Automate the creation of staging and production.
  • In companies with a culture of learning from failure, you see people who are not afraid of failing. You see a production issue where some part of the end of the value stream was forgotten because someone didn’t document a firewall rule. It’s important to document everything. People are starting to understand the need to document the failure and talk about it. Being transparent builds trust.
  • A culture where slow means safe. You can go fast and be at least as safe. Misguided performers are conservative. They go slow so won’t screw up. When you have problems it takes longer to recover. Learn from your mistakes. If it hurts, do it more often, smaller batches, learn. Very few times when you need to go slow to prevent failure – a rocket launch or a heart surgeon. You have to fail a thousand times to get the rocket launched or be successful with heart surgery. Apprenticeship is a good thing. Recognize you can go slow and sort of be safe but when something comes up you have to act fast. We had a financial services company shut down for a weekend to address the struts vulnerability. You can automate everything.
  • At Jenkins World had people Tweet their “DevOops” stories. Very technical. Doesn’t talk about specific applications. Remove dash R – remove everything. Still focused on practitioners making a specific mistake in their work. Would like to see the bigger picture. Templates and patterns. Find people with the courage to tell you what didn’t work. The first wave will be process oriented. Want a process where you can create. Gene Kim unicorn – only hear about the perfect company. See what happens in the real world. Talk about challenges.

Executive Support

  • 1) Having a lack of senior support. It pays to find someone in a leadership position who is excited by this. 2) The idea of being afraid to fail. Embrace failure and learn.
  • 1) Regression to ITIL. As we gone down the path of DevOps adopt infrastructure as code, can cascade to being devasting to a large swath of infrastructure. How to use tools with leverage safely. As we get into frontier problems, challenge organizations to think about problems differently. Policy as code approach – codify policy so you can automate enforcement. Get away from manual processes as a default. 2) Not having executive alignment. Change agents try to forge a path without executive being aware or supportive. So much of what you’re trying to do is organizational change. Need to force teams to play nicely.
  • The most common failure is not getting executive management on board early. Only the management team can align the resources within the organization to drive the level of transformation that is required. Management also has an end-to-end view of the value chain and is best positioned to bring all the resources of the organization to bear on a common plan. If management is not on board you either have to help them understand why they should be or ask yourself, “Am I working at the right company?”

Alignment

  • Applies to any DevOps transformation. Many customers excited, purchase a lot of product, time to renew but people aren’t using. Couldn’t get sufficiently organized to use. These processes are like a recipe and cannot be driven by one team. This is a culture and process change. Pushed more with customer onboarding having an executive sponsor to resolve tension. Every representative of the lifecycle is at the table. Need to bring everyone to the table or one group will feel offended they weren’t part of the process and end up being the bump in the road. Multiple constituents need to build consensus across groups and processes.
  • One of the biggest issues is when processes are not aligned across teams. DevOps is a great buzzword, but it will not work unless all teams are bought in on an agreed upon process. In many cases, executive level alignment is required to get everyone on the same page. Another common failure is thinking tools alone are the answer. Often, companies buy the hot new tool on the market and think it will solve their problems or magically get them to DevOps. Tools are only as powerful as how they are used and implemented. Companies must first start with defining the people and processes involved, as well as the desired end state goal. Once they do, and have organizational agreement, tools can then be selected and implemented to automate the DevOps process as designed. Start with the process, then pick the right tools.
  • Deploying the infrastructure without the culture. Developers and operations need to be on the same page. It’s more about the culture. When you wind up having a small team who run DevOps, becomes a big risk or bottleneck.
  • There are a lot of human challenges with too much isolation between the teams. You don’t know how what you’re doing affects someone else. Need known knowns. When teams interact it’s like a firefight. An interrupt-based approach is not good for either side – DevOps or data science. That interaction can be a source of conflict. It’s about making the data available to both. Build into the company process enabling access to data in the pipeline. Understand there are things that multiple people need. As an executive the need to manage technical debt. I do I identify my greatest technical debt; how can I understand the impact and how it works – processes and risk.
  • DevOps programs often fail because the company’s culture discourages collaboration. This is something we often see from our customers with legacy development methodologies are often rooted in isolationism. Development, operations, and security teams are cloistered and rarely know what the other groups are doing. The teams have completely different incentive structures and often have competing goals. Development teams need to hit goals which causes them to write code first with little thought spared given to security. Security teams bring the process to a halt because they are worried about vulnerabilities that can be exploited by cybercriminals. Operations teams are resistant to change, making it difficult to introduce new technologies like Kubernetes and containers. Organizations can solve this problem by taking DevOps implementations slowly. You need to crawl before you walk and walk before you run. DevOps is a cultural transformation that can be met with resistance by people afraid of the unknown. Businesses should be patient with their staff and refine their DevOps program based on their own experience. DevOps is a long-term goal, so patience is required.
  • In many cases, DevOps teams still segregate the concerns of “Dev” and “Ops”, while the best teams build culture and process that integrates the infrastructure and app code management as part of the same system design. Not only does this shared responsibility drive developers to improve the reliability of their deployment systems (and app code!) it drives empathy between team members. If you or your teammate might be the person that will have to address an infrastructure issue late at night caused by an app code change, you’re more likely to write better tests, patch sub-par monitoring, or devise ways to scale in an automated fashion.
  • Not sharing the same goals and only focusing on process automation. The importance of DevOps starts with developing a common goal across functional units. For an organization to achieve full potential, it’s much better to look at the whole dev life cycle and find opportunities for encapsulation and abstraction. On the implementation level, the other most common fail we’ve seen is the inability to create automated tests that are representative of what issues may actually happen.
  • Some tech pros assume DevOps is simply merging development with operations and then agreeing on new agile processes to bind teams together. In reality, DevOps requires much more than merging two disparate teams—it requires collaboration before development and after production, to ensure teams find ways to use their tools in new ways that connect, rather than firewall them from one another. Successful DevOps teams align individual people and teams with their strengths but invest in technology that connects them. For example, operations teams are focused on monitoring dashboards throughout the application lifecycle, while dev teams help ensure metrics, logs, and tracing are baked into applications from the very beginning. By collaborating at every stage, both teams find ways to use their tools to help one another.

Others

  • We hear from people doing DevOps for 18 months and yet to release anything. In the wake of doing quite a bit of automation find a couple of areas of low hanging fruit a lot of time or waste and try to automate – manual build or test. DevOps is about automation. Lack of automation is the number one cause of failure.
  • Unable to move massive amounts of data.
  • Cutting corners. Not going through all of the security checks and deploying products on admin or privilege level where you shouldn’t need to. May accidentally require a privilege level to use the app. Don’t find out about security vulnerabilities and until it’s too late.
  • Insecure apps unless doing DevSecOps. Quality of coding and runtime integration will mean less because of programming insecure applications, loss of trust and decline on the stock exchange.
  • Customers are not tolerant of failure. Deploy to production and retrieving is not acceptable in banking for compliance issues and lose the trust of customers. Banks don’t fail fast. They’re willing to embrace the pipeline and velocity up to deploy quickly. When it comes to deployment want to test a couple of times and deploy every day versus a couple of times per year. Test more before deploying. Velocity to getting to test environment is pretty good but wait in test for more of a planned release.
  • Taking on too much. Everyone is trying to transform. People are used to the traditional way of planning. If can adopt DevOps now developing and employing more rapidly. Other business units doing annual planning need to adopt DevOps. Apply DevOps notion across the entire value chain – how to collect money, work with partners, manage budgets, pricing. Global companies need localization – need to look at end-to-end. Where do I get started? Apply lean principles – remove anything that’s not adding value to the customer. Where is my biggest bottleneck? How long to make and implement the change – testing, automation, decision making between two teams.
  • If you want a recipe for failure, give your developers too much power, and don’t give them any responsibility for what happens in operations. Developers will just keep throwing code over the wall, which will lead to a lot of frustration in ops and a widening of the productivity gap. If you want to succeed at DevOps, don’t push bad quality forward. Focus on “first time right,” instead of on fixing problems later. Also, pay attention to the whole release-to-production pipeline, not just continuous integration. If you don’t, you might gain some speed early on, but you’ll soon hit a wall. Finally, automate in ops. If you automate a lot in dev but not in ops, development will speed up dramatically, but ops will fall behind. Dev may be tripling their output, but ops will be stuck doing most things manually, and again, you’ll widen the productivity gap between the two.
  • Over-engineering. Fall in love with the process and forget to bring value. Need someone from business in the loop. Separate silos created hard to create a joint team across the organization. Fail to see the most value. Rectify by involving people from the business. What organizational structure does the DevOps team report to? Each organization needs to determine what’s the best fit. Visit other companies like yours and see how they built their DevOps team. Failure is not getting the value you expected up front. Going from monthly to weekly to daily to multiple releases per day.
  • People tend to choose a technology because its hip and col and try to solve the problem versus thinking about the problem to solve and picking the best technology to solve. Don’t blindly pick the technology that works for others and apply to your own situation. Pick the tools and technology based on the problem you are trying to solve. Don’t’ force a technology on the problem. Not being clear about responsibilities. If responsibilities are not well-defined, then culture change doesn’t take effect and people don’t take ownership. Information must be shared and made transparent.
  • The two most common problems are misunderstanding the type of change that is required to begin a DevOps journey and not taking the time to create an environment where teams can invest the time necessary to make changes. For the first problem, DevOps does not simply mean buying a tool. It is much more about changing the way teams work with each other, how they view the value each other provides and how they measure and experience success together. Early in coaching engagements, we work with clients to ensure that stakeholders are aware of and understand the steps involved in a successful DevOps adoption. For the second problem, it’s important to note that change takes time and can be difficult. Because of this, it is critical that an organization’s leaders shield new teams from the traditional organizational pressures to deliver, which the company is moving away from. There is often a lack of organizational support in long-term adoption, which usually results in teams that don’t feel they have enough time to change the way they work or learn something new because deadlines are too tight.
  • Two major things come to mind. 1) The first is not having rigorous and automated testing for your infrastructure automation code. 2) The second is not ensuring the auditability of operations engineer actions. In many DevOps organizations, there exists a cultural structure that believes that operators are intended to create automation to enforce quality and auditability on the work of other software development teams -- but somehow these same standards do not apply to the operations team as well. Having automation built into your source control system can help to ensure that untested code is never allowed to be merged and forces engineers across the company to ensure tests are written. By the same token, audit processes should be held in authority outside your operations team and the operations team should deploy auditing systems that include monitoring and tracking of their actions as well. Auditing should be done by a security or compliance function, rather than the operations function, within your overall organization.


Here's who shared their insights with us:

Find out more about how Scalyr built a proprietary database that does not use text indexing for their log management tool.

Topics:
devops ,enterprise devops ,automation ,fail fast

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}