Agile DevOps: Rapid Ops Changes for Rapid Delivery
Dev and Ops have to work in harmony to deliver on Agile projects efficiently.
Agile software development intrigued IT operations like nothing else in recent times. Agility adopted by engineering teams demanded swift response from Operations to the changing needs of business stakeholders, to create product differentiation by delivering new features to the market in short cycles continuously. This necessitated major changes in the way operations or Ops released new versions of software and also the way engineering delivered products for deployment.
Conventionally, large enterprises always had separate departments to develop new products, manage numerous applications and deploy part of the IT infrastructure and applications management. This grouping of people with similar skills was for better organization, utilization and management. It worked very well for traditional waterfall projects. But with the growing need for agility in business, it was essential to develop and deploy newer versions of software rapidly. It required teams that could work cohesively, shared same goals and would not be bound by their respective departments.
Development and operations are often perceived to be positioned on either sides of a wall. The Ops team is assumed to be rewarded for keeping the applications stable with an uptime of 99.99%, whereas the Development team is perceived to be responsible for building and delivering new features quickly and thus bringing more revenue to business. Ops teams, which already had a busy roster with demands like keeping the applications up always; upgrading, migrating; maintaining availability and stability of applications; monitoring performance and loads on the system to ensure that it does not crash and so on, now also have to fulfill demands of deploying releases every few weeks. On the other hand, the development team that was only focused on rapidly building software is now expected to also take care of concerns that may become roadblocks in deployment.
Contrasting goals and increased expectations from the teams can build the culture of resistance across these teams. Figure 1 illustrates the traditional setting.
Figure 1: The Current Perspective of Development and Operation Team
Source: Infosys Research
These functional silos and conflict in motivations of two prime constituents of the organization gave rise to the concept of DevOps, a practice that did not originate from the Agile stable. The prime purpose of DevOps is the convergence of development and operations to facilitate business to not only develop but also deliver new features frequently without disrupting existing features. Patrick Debois is credited with coining the term DevOps in 2009 . The concept of DevOps is discussed in detail in this paper. This paper also includes a case study that implemented DevOps and captures the ways adopted to address a critical situation for a client.
Need for DevOps
DevOps has a fair bit of similarity with the concept, culture, movement and philosophy of Agile. Many practitioners consider DevOps as an opportunity to train developers to adopt the Ops outlook and vice versa. It is also an opportunity to train Ops in development methods so that the teams work together to achieve stable deployments with fewer surprises. But, DevOps is not limited to cross-training people. Even in the past, enterprise application development had IT and Ops teams work together to deliver what business wanted. However, the rise of Agile adoption led to a different set of challenges hitherto not faced by the Ops team. Ops needed to manage existing applications/infrastructure and at the same time churn out faster releases to production continuously. The release calendar window of a few months to a year has begun to shrink rapidly with Scrum and XP becoming mainstream development methods, where the releasable product is made available for deployment every 2-4 weeks.
In addition, the distributed Agile model execution meant that Development (distributed, off-shored or near-shored) and Ops team can be sitting in different parts of the world. In such a situation even a simple event like communicating about a build that is ready for deployment can also be tedious and time consuming. Additionally, repeating the processes surrounding the deployment like notifications, change approvals, rollback, data management, infrastructure provisioning and the likes every few weeks proves to be a real problem at hand. Thus, it often happens that the Development team adopts Agile whereas the Ops team continues to work in a phased manner where handovers from Dev to Ops need a long notice period.
In this context, DevOps comes across as an effective option to reduce the cycle time. It should be noted that DevOps is not a separate defined unit in Corp IT, production support or backend services. It is an attitude with of trust and shared goals to introduce efficiency in the delivery lifecycle to achieve faster, stable and scalable production releases collectively with the help of tools and automation.
Modern enterprise applications are growing in complexity with the release of new features frequently. Also, maintaining a million lines of brittle legacy code requires the adoption of a new set of DevOps tools that can help in building an efficient Agile delivery pipeline. In addition to scalable configuration management solutions written using scripting languages, tools like Mercury BAC transaction for Performance and Application Availability Monitoring, Splunk for debugging production issues, Nagios for application health monitoring are becoming more of a norm to sustain frequent production releases.
DevOps is often considered a silver bullet as it addresses many of the issues like silos of Ops and Dev, the concept of us vs. them "it works in Dev environment then why doesn’t it work on your production servers?"; "Why are the prevalent features changed with your new development?"; "The issue is because of corrupt production data." But DevOps is not a magic potion that can make persistent problems disappear into thin air. Making development or deployment Agile will not yield the desired result; the change should be in the complete workflow to affect a real turnaround. Many large organizations have actually achieved frequent deployments using strong DevOps roles, practices and tools. For instance, Flickr that hosts billions of photos and serves 40,000 photos, 100,000 cache operations, 130,000 database queries in a single second has realized 10+ deployment per day by institutionalizing some robust DevOps practices on continuous integration, automation and continuous deployment .
Core Principles of DevOps
The basic premise of the DevOps model is to bring efficiency in the workflow of software development from inception to the release. It means that it is not enough to develop and test to make the feature releasable in short cycles. Development needs to understand what Ops cares for; for instance, performance aspects, scalability, impacts on existing workflow and writing fail safe code to name a few.
Scrum, XP and Kanban, the most popular methods used to adopt Agile, were designed with an eye to deliver business-critical features in a faster and continuous way. In Scrum and XP the potential release can happen anywhere between 2-4 weeks whereas in Kanban it can be at the end of any logical package of work items that can bring value to a business. Hence, there is an absolute need for Ops teams to respond quickly and deploy the changes that business demands from production systems. The core principles of DevOps movement and philosophy that cater to the needs of short iterative software development and deployment are given here:
- Better collaboration, communication and trust between Dev and Ops team
- Automate at each stage (build, deployment, configuration changes, QA tests) relentlessl
- Moving from nightly builds to frequent automated integrated build and continuous delivery pipeline
- Effective tools to automate end-to-end workflow.
The next section will elaborate on a couple of DevOps models.
The Integrated Mindshare Model
In order to build a culture of trust, transparency and shared goals, it is important for Dev and Ops to have common milestones in this journey. It can be of immense value to involve the Ops team members in Dev ceremonies.
Table 1 shows a sample of how Ops can be included in development activities. Here Scrum is chosen as an example as it is the most popular method in Agile. Development happens in short iterations of 2-4 weeks when a Scrum framework is followed. At the end of each iteration (or Sprint), the scrum team delivers a potentially releasable product increment. The Ops team is expected to deploy this increment regularly as per the business request. In such a situation, combining or overlapping ceremonies’ benefits to get different expertise, and gathering point of views from the team helps to deliver results that suffice the needs of Ops team along with business.
Table 1: Operations as part of Mainstream Agile Activities
Source: Infosys Research
By getting a mindshare early in the development, the Ops team gets to know the timeline for the development of a new feature in advance. The Ops team also gets to plan for the production release and the required IT provisioning in parallel to development rather than opting for the sequential mode. It also reduces the need for extensive knowledge transfer sessions from Dev to Ops team which otherwise becomes necessary at the end of a sprint.
Though combining ceremonies or involving participation of Ops in development Scrum ceremonies sounds like a good idea, in practice there are many challenges that one needs to deal with. In Scrum framework the work to be delivered is decided and ordered according to their value and ROI. The work items for technical debt like refactoring, modularity, scalable design, etc., almost always takes a back seat in absence of the perceived notion of value against stable and maintainable product in the long run. To address these challenges, business should be flexible to accept new approaches like focusing on the needs of Ops team that includes measures for performance, better exception logging/handling, etc., that do not have a direct impact on the business value.
With the mindshare model, Ops team begins to operate in the Agile mode and both the teams follow a collaborative work culture to deliver the true value to business collectively. The Ops team is also more aware of the upcoming releases and is better prepared. Concurrently, the Dev team will have a better realization of challenges faced by Ops team in arranging infrastructure, handling multiple production releases in a short span of time, issues related to performance and load, etc.
Embedded Team Model
The Ops team often helps business to keep the existing offering stable and still address the regular requests of customers. Apart from these, the Ops team does myriad of other tasks like test support activity, security, backups, updating support docs, plan for release window, informing stakeholders, seeking approvals for change sets and so on.
In an ideal situation, it will be good to have Ops and Dev teams merged and a single team works on end-to-end lifecycle of the software. Embedding people from Ops team in Development not only breaks the wall of resistance and creates collaborative Agile team, but also helps the Ops team to learn the development codebase, understand the perspective of feature changes and the way value is delivered to business. On the other hand, development team learns about life beyond what they churn out in sprint and what all it takes for their work to see the light of the day. This would make them more appreciative about the Ops view of the product and ways to make the process better by following some simple rules. Such a team with merged roles can be created by hiring people with blended skills or cross-training Dev and Ops roles for each other’s skills. Figure 2 depicts an integrated team model.
Figure 2: An Integrated DevOps team Model
Source: Infosys Research
It is important to note that it is not very easy to merge roles for an organization that has separate roles, reporting hierarchy, SLAs and objectives for years. The organization can take a middle path of cross-training, rotating the roles of developers/ops representatives to increase the level of awareness and still continue to use their expertise separately. In this scenario, Ops and development teams that operate separately but have appreciation of each other’s work, will have touch points in terms of ceremonies and artifacts to ensure that end-to-end agility is achieved.
In the embedded team model, a horizontal PS&M team can form the backbone of multiple scrum teams. DevOps can be successfully implemented through different ceremonies, common product backlog, etc., even if the actual roles and teams are not merged.
Effective use of Artifacts
While we discussed how ceremonies can be attended by both the teams and roles merged or used separately, an important aspect of Scrum, namely artifacts, also need to be carefully modified to suit DevOps.
For example, in an ideal scenario when roles are merged, the team may have a single product backlog for both Dev and Ops work items. When roles are not merged; the Ops team should also maintain a product backlog for its prioritized work items. Regular inspection of Dev and Ops product backlog should be performed to see that all prioritized work items of each team gets due attention in other’s teams product backlog (or placeholders may also be created).
This will ensure that all important items required for deployment of a particular release to production are taken care by both the teams.
DevOps to NoOps and DevOpsSec
The NoOps concept or terminology is being used recently by a few teams to discuss whether automating each steps like UT, build creation, deployment, QA, system, system validation, system integration testing (ST/ SVT/SIT) user acceptance testing (UAT) with ATDD or BDD will mean that there is no need for operations systems. NoOps also includes the concept of monitoring to help in reducing any outage. Along with this, the rising popularity of cloud computing to provision infrastructure on demand and Platform-as-a-Service (IaaS, PaaS) is being positioned as a game changer for Ops.
This phenomenon is definitely where all organizations are heading toward but the fear of it going toward NoOps seems ill-founded. It may happen that the invention of new tools will require Ops team to reorient themselves and learn a few new tricks like writing Infra-as-code; get familiar with single click build; and push button deployment concepts making deployment configuration on production automated and so on to stay afloat. A release calendar window once defined is not sacrosanct anymore. Ops team needs to be ready to shrink the release cycle by automating the runbook as much as possible and introducing efficiency in the entire release/post-release workflow. Periodic production deployment will be a non-event, only if there are continuous deployments while development is on. The security aspect is often ignored till the last moment and it needs to be taken care during the development phase of a product.
The modern enterprise applications can have many security threats and could be vulnerable to SQL injection, code injection, cross-site scripting, DOS attacks to name a few. Ignoring concerns of information security team in a thriving e-commerce economy while adopting incomplete DevOps model, can leave organizations with threatening loopholes that they cannot afford.
This is why the concept of DevOpsSec, applying DevOps principles to application security creating an Agile triangle, is gaining momentum . It is believed that leaving the security silo out of DevOps' vision can be really detrimental to business as majority of successful attacks occur against previously known vulnerabilities for which a patch or secure configuration standard was already available.
More and more companies are developing tools and products to make the best use of technological advancements. Microsoft, providing virtual machines as part of their cloud platform Azure, could challenge the traditional role of the Infra team that used to provision hardware and configure software as per the needs of the development team. While tools like Puppet and CFEngine help in automating provisioning of IT infrastructure, Chef is a good tool to automate system integration framework over cloud. . The on-demand provisioning of infrastructure that is still in a nascent stage can be used by masses soon and would need Ops team to acquire matching skills to code the dynamic infrastructure.
A large telecommunication organization wanted urgent and critical changes to its billing and revenue assurance platform. Changes were supposed to be implemented with stringent timeline since those were for industry regulations and non-compliance would lead to huge penalties.
While earlier the same platform was maintained using waterfall methodology, due to aggressive time-to- market and the need to deliver highest priority requirements upfront, Agile was the only way to go.
Need for DevOps Adoption
The development in a two-week sprint cycle resulting in shippable product delivery started putting additional pressure and creating bottlenecks in the delivery, deployment, post-deployment processes that involved the operations team and agents (BPO). The operations team was apprehensive about deploying the changes to go live faster. Agents were also not comfortable with frequent changes to the applications, workflow and were concerned on supporting the existing customer base properly.
The lack of collaboration and expectation mismatch led to conflicts between the stakeholders in IT and development. The developers focused on implementation and did not consider the requirement from the perspective of deployment. On the other hand, the operation team was more focused on reliable, scalable, high performance and high quality systems. Their performance measured by the stability of the systems resulted in resistance to deploy changes to production. The agents focused on usability, performance and availability are used to extensive UATs, comprehensive user manuals and trainings before rolling out each release that was certainly absent.
Goal and Approach of the Solution
A common goal for stakeholders, developers, the operation team and agents was developed. A common aim for everyone was defined with developing well-tested and timely programs, frequent and reliable deployments, and making transitions easier for agents. The approach to achieve this included a persistent focus on major tracks like collaboration, automation and continuous delivery, knowledge sharing, and continuous improvement.
In the Agile software development, collaborating teams can always detect the issues early and plan for the risks ahead of time. In this case, collaboration was encouraged not only between the Dev and Ops team, but also included the BPO team. Some highlights of collaboration in the project are listed here:
- Operations active participation (10-20% bandwidth) during design and development phases. They attend daily stand-up, retrospectives, planning meetings and showcases of project teams.
- Developers’ got acquainted with operative procedures. Regular conversations with the operations ensured their opinion and concerns were heard and they were informed about the requirement/changes coming up for deployment and take proactive measures. For an incident occurring in production, a developer was available to assist on call that helped to discovering the root cause and resolve it faster.
- Agents were able to voice their concerns in a timely manner with the development and operations team, as they knew possible changes to their systems and workflow. This continuous feedback loop proved to be essential in improving the quality of service.
- Co-location of teams and daily common meet/calls over the initial phase helped to overcome apprehensions about the model and friction between teams.
Automation and Continuous Integration
With Agile development and sprints of two weeks, the quality of delivery can become a major issue if not managed from the very beginning. Automation and CI are the key aspect to ensure that the team is always ready to deliver the feature. The objective was to ensure that the systems are production ready and deployable throughout their lifecycle. This helped to perform deployments more frequently to build/test/UAT environment on demand. Developers got rapid feedback on the production readiness of the systems from operations team and agents. The team faced challenges provided here.
- Team members working on various branches may not regularly integrate into mainstream.
- Repeating the entire test cycle manually after each integration will become time consuming and tedious.
- Applications had multiple interfaces, making it difficult to test the software without an integrated environment.
The team came up with CI and increased automation with build and deployment processes as the top solutions to deal with these challenges.
Continuous Integration: Developers logically split up their work into smaller steps so they that could integrate at least once a day. First priority of the team was to keep the build green over integrating work with errors. Fixing the failing build was treated as the highest ranked work every day.
Test Automation: Along with basic manual testing, automated tests were developed to detect problems immediately at each integration. Automated test coverage was increased incrementally. It reached around 40% that was agreed to be enough for a smoke test and to avoid full execution of regression tests for each scenario.
Deployment Automation: Automated deployment scripts were developed to build and test UAT environments. The aim was to automate the end-to-end workflow and not just a part of it. Strong monitoring and logging mechanisms on production ensured live issues were caught faster and handled proactively.
This was achieved through comprehensive knowledge sharing mechanisms.
- The development team shared various tools like order tracking, billing intercepts along with macros, scripts with Ops team. This helped operations to tackle any issues post deployment and improved their efficiency to respond or debug failures.
- The operations team shared tools and techniques to manage environments and infrastructure with Dev team.
- A knowledge management tool was developed for agents to cope up with frequent changes to systems. This tool ensured quicker access to accurate and timely information about various changes online. The tool also provided a demo of new functionality, unified search, quiz section and online query section. This resulted in lesser dependency on comprehensive trainings and user manuals.
Kaizen - Continuous Improvement
To sustain the continuous integration and deployment, the team also required continuous introspection to find opportunities to make the execution better. Key initial objectives were achieved with collaboration, centralized automated build infrastructure, automated test frameworks and continuous integration and delivery. We kept learning from our mistakes and continuously improved with each sprint to imbibe the culture of Kaizen.
- After early success, the operations teams were also converted to Sprint teams. The operations team also helped with low priority items from product backlog during the lean period. Whatever they delivered during each sprint was a bonus.
- All high priority defects were fixed immediately. Non-priority defects found in production were added to the product backlog. The development team handled these by either re- prioritization (depending on the type of issue in operations the team determined the priority against those stories in play) or by reducing the overall bandwidth for in-flight development to support operational issues by 20%.
This helped in improving collaboration further and also helped in optimal workload distribution across development and operations team.
Within a few months of adopting DevOps, the client was able to derive benefits. Deployable code was always ready in short durations that sounded impossible to achieve earlier. Finally, the release cycle was improved by 500%. Production rollbacks became a rare exception. Some of the conclusions drawn were:
Giving operational responsibilities to developers enhanced the quality of service immensely, both from the perspectives of customer and technology. In the classical model, development and operations are completely separated and work in their own silos. The Amazon model of "you build it, you run it" helps developers to observe the day-to-day operation of the software they have developed and also to understand practical problems. This also brings them in contact with the end users and helps them to understand their expectations better . On the other hand, Ops teams could deliver better by gaining more confidence by observing actual development and support from the development team.
In today’s ever changing, highly demanding and volatile market, DevOps has become a reality that can't be ignored. More and more enterprise customers are embracing Agile software development in one or other way to reach out to end users with more features, differentiators and are gaining the first mover advantage. Agile development requires swift collaboration between development and operations community to enable them to cater to business needs and curtailing down the longer production release cycle. The differences between the Dev and Ops can cost business dearly as they will not get early, continuous and stable delivery.
Many large organizations have responded to this need of the day by changing their work environments. For example, web giants like Amazon and Flickr have already shown the power of combining the workforce to roll out new features at an amazingly faster rate and capture market imagination [1, 2]. The introduction of new tools and large scale cloud computing has added to the enormous possibilities and dimensions in adopting DevOps culture more confidently. Lastly, DevOps is not only about technology advancements, CI or continuous delivery but it is also about people, collaboration and breaking silos.
DevOpsDays, Ghent 2009. Available at _http://www.devopsdays.org/events/2009-ghent/
Flickr Arch. Available at http://krisjordan.com/2008/09/16/cal-henderson-scalable-web- architectures-common-patterns-and-approaches Amazon model. Available at http://queue. acm.org/detail.cfm?id=1142065
Neil MacDonald, DevOps Needs to Become DevOpsSec, Gartner, January 2012. Available at _http://blogs.gartner.com/neil_macdonald/2012/01/17/devops-needs-to-become- devopssec/
DevOps Trends. Available at http://www.ittoday.info/Articles/DevOps-2011-TrendsOverview. pdf
Chef & Puppet. Available at http://www.wired.com/wiredenterprise/2011/10/chef_and_ puppet/
NoOps@NetFlix. Available at http://perfcap.blogspot.in/2012/03/ops-devops-and-noops-at- netflix.html
10+Deploys Per Day 10+Deploys Per Day Presentation
RedHat Enterprise PaaS. Available at http://archive.globalservicesmedia.com/News/Home/ Red-Hat-Unveils-Enterprise-PaaS-Strategy/21/27/0/GS1205178710793.
Sameer Kulkarni is a Delivery Manager at the ECS unit of Infosys. He has around 17 years of experience in IT and has been involved in critical programs for leading telcos across the globe. He also heads ECS Agile CoE that drives Agile adoption across this vertical and also helps customers to implement Agile. He is a CSM and has worked as a coach for large and complex engagements in the past. He can be reached at Sameer_Kulkarni@infosys.com
Ketan Shah is Principal Consultant with ECS-ADM unit of Infosys. He has over 13 years of experience in IT including 6 years of experience in E2E Agile (Scrum, XP, Kanban) execution as Scrum Master and Coach for large enterprise engagements. He is also part of ECS Agile CoE that drives Agile adoption across this vertical and also helps customers to implement Agile. He has industry certifications like CSM, Six Sigma Green Belt. He can be reached at Ketan_Shah01@infosys.com