Though it may not be hard to find laborious attempts to define this new stream of work in the IT industry, its understanding on the floor would depend on who you ask. More than the accuracy of its definition, I am interested in exploring the reasons for its emergence and discussing the relevance of its role in delivering high-quality service to clients and end-users.
During the era of in-premise software, the responsibility of product development group usually ended when the build and release team began preparing the software for cutting CDs and for clients to download. The software deployment and applying patches were done by the client's IT department, with some or no help from professional services group. Even though the product support group took the heat from clients whenever nasty bugs were discovered in the product, the onus of handling production issues were left with client's IT department. Of course, the IT departments had to deal with their own internal users.
The advent of web consumer applications and SaaS products changed all that. With multi-tenant deployments of software applications in the cloud that are expected to be available 24/7, any undiscovered defects in the product and issues with availability of the applications might be escalated to top management by clients at any time. Though it might be impossible to release a bug-free product and ensure 100% availability, taking preventive measures for maintenance of the system and responding quickly to incidents reported in production have become very important. An in-premise software installation typically has thousands of users, but a consumer web app or SaaS app have millions of users, which requires scaling up of production operations.
It became obvious that to deliver a high-quality software solution to clients, support of a high-quality operations team is required. Their skills should be much broader than the traditional skill sets of system and database admins and application support engineers. Scaling up of operations needed serious automation in areas beginning platform and product deployment through monitoring, and that required scripting and coding skills. The typical DevOps roles turned out to be the following.
Automated Build and Release
Though automated nightly builds and smoke tests have been done even in a classic software development environment, such efforts were custom and scripted. The availability of automation platforms such as Hudson and Jenkins standardized such processes. The lead time to deploy changes in production has been minimized as a result.
This is a very generic term that previously referred to source code control systems (CMS), but in DevOps context, it refers to automation to define and create system components or roles. Though system components are largely software, in virtualized environments, the provisioning of hardware components such as virtual machines and storage volumes are very much in the scope.
A typical provisioning of a compute node for a system component or role can start with creating a virtual machine, setting up user accounts and access privileges, and installing baseline software bits specific to that role.
Most of the time, deployment automation is tied to configuration management or build and release infrastructure. While configuration management takes care of the baseline setup required for a system component, deployment automation addresses automated processes in getting the application software releases and patches installed on various types of compute nodes regularly. For example, Jenkins integrated with a CMDB system can be used to provision baseline compute nodes for a system component, and the same infrastructure can be used to push code incrementally to same compute nodes as part of the Continuous Integration process.
The last thing that the providers of SasS and web consumer apps want to do is get notified of production issues by client users. The SaaS providers lose credibility and web portals lose ad revenue if features don't work as intended. Even though highly reliable monitoring systems are available, it is impossible to catch all of the issues using out-of-the-box features of such products. Extending the features based on domain knowledge of the application is the key to getting notified of potential issues before customers find it, and that requires a broad set of skills (mainly, scripting and knowing how to consume a wide range of native and REST APIs provided by the third-party tools used and the application being monitored).
A well-instrumented software system can leave tons of information about health and performance, and aggregating such info for troubleshooting and reporting can be a daunting task. Besides gathering info for management reporting, insights into the working of the system can also help improve performance and fine-tune operability, which ultimately contributes to the increased availability of systems.
I don't attempt to define DevOps in a very long sentence here, as such attempts are awkward and always leave out a few things, as it is still an evolving field. Generally, a DevOps engineer will work in one or more areas discussed above. The keyword is "automation," which helps scale up the production operations for delivering high-quality SaaS offerings, consumer web apps, and backend of mobile apps to a large number of users.