Let's continue our four-part series about the tasks that DevOps teams are typically responsible for. You can check out Part I here.
Tools for Production Engineering
My objective here is only to list the kind of tools needed for a production engineering team for them to be effective. In each category, you can easily find multiple competing products. If I mention any product specifically, that only indicates my familiarity with that product. The important thing is to have some tools, including home-grown tools, available when the need arises. It is also important to avoid using multiple tools in the same category unless there is a compelling reason to do so.
A documentation platform that can be used by both development and operations teams is an essential component for collaboration. Wiki-based solutions like MediaWiki are most popular, but even gDocs will do the job.
Any documentation solution can easily degenerate into a storage location of assorted documents very soon. To avoid a free-for-all anarchy situation, it is important to set a structure for organizing the documents right from the beginning. One effective method is to organize documents around applications and using templates for creating standard documents.
Configuration management is a very generic term. In the DevOps circles it usually refers to a Puppet or Chef like tool that manage the system level configurations and baseline software installations on a computing node. There are at least 3 different configuration management needs and if we include the configuration requirements for automated deployments, the list can grow to four. However, the subject of deployment automation is better discussed in the larger context of Continuous Integration (CI).
Automation System for Access Control and Baseline Software Installation
When a user joins or leaves the company, various accesses (both at system and application level), defined for that user related to the role of the user should be propagated to various systems automatically. Tools like Puppet and Chef may be the most popular, but there are plenty of alternatives available.
As a new user is provisioned or a departing user deleted from the system, when a new computing node is provisioned, the baseline software bits needed for that node can be installed and configured as well using these tools. The system-level configuration and software deployment are done on a computing node is based on its "role" in a larger software system.
Configuration Management Database (CMDB)
Managing configurations of application stacks and environments is the next requirement. Implementation of a full-fledged CMDB system may not be warranted, but at least a custom solution will be needed eventually because without a single source of truth for such configurations, rolling out serious automation projects may be hard.
For example, keeping track of what system settings and software bits go into a software role would have an immense use if we want to stand up application stacks in a totally automated fashion. I still remember the joy of helping ourselves in building large object storage farms from few command lines, which used to be an excruciating, week-long effort of cutting and pasting scores of commands and running manual steps.
Software Configuration Management (SCM)
Traditionally, the use of a SCM tool like Subversion or Git is limited within production engineering teams. Scripts used for ad hoc automation efforts will be in somebody’s home directories and when Brian leaves the company, hell will break loose in the application area that he has been supporting smoothly up until then.
SCM system is not only for application code development. Any piece of code or configuration data that is needed to replicate the application environment has to be managed using the SCM system. Code is not only for defining the product features; in a highly automated environment, code is also needed for maintaining it.
Make sure that members of a production engineering team are skilled in using the company’s SCM system. If there are multiple SCM tools in use, take leadership in standardizing on one. The existence of multiple tools is a clear indication that product development teams work in silos and that normally creates nightmarish scenarios for production engineering team because when issues happen you will come across development teams that are more inclined to cover their bases than resolving issues.
Ops code should also go through peer review and be part of the release process to have visibility on what is deployed in production. The need to include ops code as part of the release process is becoming more important lately as the concepts of infrastructure and platform as code can be implemented, and, they are not very different from writing code for implementing product features.
Continuous Integration (CI)
CI automates the code deployment process beginning the step of code checking in by developers into the SCM system. On the CI platform, the code changes are built, packaged for deployment, and deployed in a staging environment where the changes are tested.
The developers will get immediate feedback on the quality of their code. This helps get bugs fixed immediately. The integration of the code is incremental and continuous, and incompatibilities are ironed out early on. There will not be the need for specific integration tests.
Jenkins is a popular CI platform available to rolling out CI process. The CI processes are integrated with CM, CMDB, and SCM systems.
Like an SCM system, a bug tracking system is primarily rolled out for the use of development organizations. It is important that the production engineering team gets visibility into the projects and issues tracked in that system. The team should also have privileges to create its own projects and queues to manage code related to DevOps areas that we discussed in the beginning.
The bug tracking applications are typically part of the generic ticketing systems. There has been no dearth of both open-source and licensed software in this area. Bugzilla and Jira are some of the well-known products that I have used.
A matured monitoring infrastructure will have checks implemented at different levels. Infrastructure, network, system, and application monitoring using industry standard tools such as Nagios, Zenoss, etc.
If you have to monitor a consumer web app or SaaS application that test has to be done from the Internet outside of your corporate network, there are many service providers in that space, like Apica and Catchpoint.
At the very basic level, log aggregation tools gather the system and application logs at one place and index them for search. Looking through the logs for error patterns and setting up alerts on their occurrences can help with catching issues that dedicated monitoring might miss. There are both open-source and licensed products in the market; Loggly, Logstash, and Splunk are some of the popular products.
Third-Party Tools Dashboards and API
Many third-party tools that are used to build the applications might come with their own admin tools. There will be some monitoring features available with those. While dashboards can be used out of the box, monitoring-related APIs that provide status on the underlying components could be used to build monitoring checks on the main monitoring platform.
This is the second of a four-part series on building a DevOps organization. Stay tuned for more!