Mining the Ground Truth of Enterprise Toolchains
Mining the Ground Truth of Enterprise Toolchains
See what we can learn about enterprise agile and DevOps environments from research like the State of DevOps reports.
Join the DZone community and get the full member experience.Join For Free
To learn more about what works and what doesn't in large-scale DevOps and agile deployments, we need data. The problem is, that data is notoriously difficult to get ahold of because much of it lies hidden across numerous private repositories.
Efforts such as The State of DevOps reports have helped us gain some understanding by using survey data to answer questions about practices such as the frequency of deployments in a team or organization. However, survey data has its limitations, as Nicole Forsgren and I described in "DevOps Metrics," which we wrote to clarify the trade-offs of system and survey data collection. 1 Today, our understanding of DevOps practices is based largely on this survey data and on anecdotal evidence. Is there a way to expand our view of DevOps to include studies of system data of DevOps at work?
One approach is to examine publicly available repositories, such as those hosted by GitHub or the Eclipse and Apache foundations. However, the conclusions from this research are limited to how open source projects work. Large-scale and enterprise software delivery differs considerably from open source delivery in terms of the scale, scope, and type of work.
In my PhD research, I initially studied open source developers. 2 Gail Murphy, my supervisor, pushed me to expand my study to professional developers in enterprise settings. Having spent most of my career doing open source development, I was shocked at how different the work was. The most interesting thing I learned was the additional complexity with which the professional developers worked on a daily basis. The number of applications, systems, processes, and requirements dwarfed anything I encountered in the much more elegant world of open source.
In my article " The End of the Manufacturing Line Analogy", I discussed how advanced car manufacturing relates to software production. 3 One of the amazing things about car manufacturing is that the "ground truth" of production is visible on the factory floor. Walking the assembly line provides an instant view of the workflow. Where can we find the ground truth of enterprise software delivery? How might that ground truth change our understanding of what works and what fails in software delivery at scale?
Exploiting the Cambrian Explosion of Tools
In my last post, I summarized how a "Cambrian explosion" has led to the proliferation of hundreds of DevOps tools. 4 One key reason for this explosion is how specialized the tools have become for various stakeholder needs. For example, a large enterprise might have a dozen different specialists involved in software delivery, such as Java experts, AWS (Amazon Web Services) experts, design experts, or support staff. There's now a specialized tool for each role.
That presents an interesting opportunity. The more that these tools are used, the more those particular practitioners' work is captured within them. If only we could get at that data, we would have a unique chance to better understand how DevOps, and software delivery in general, works in practice.
The challenge is that such end-to-end system data is inaccessible. It's hidden behind organizations' firewalls or locked in private repositories. Occasionally, a vendor will have a slice accessible-for example, a software-as-a-service support desk tool vendor might have cross-company information on support tickets. However, that's only one slice of the value stream; it misses all the development and other upstream data and does not provide an end-to-end view.
In my study of open source and professional developers, the trick was to use the developers' access to tool repositories as a proxy for what was happening in those repositories. But, again, that was just a slice of the value stream. However, through that experiment, I realized I had access to exactly the people who had visibility into the end-to-end set of repositories: the enterprise IT tool administrators.
Value Stream Integration Diagrams
My company Tasktop works closely with many enterprise IT tool administrators responsible for the agile and DevOps toolchain. Each engagement that our solutions architects undertake results in the creation of a value stream integration diagram. The first time I looked at these diagrams in aggregate, I realized I had a data-set that was as interesting as the one Gail and I collected during my PhD studies. These diagrams depict each of the tool repositories in the value stream, each artifact type stored in those repositories, and, most important, how the artifact types are related. These diagrams were collected not through an academic study but through a data collection process put in place for working with the enterprise IT tool administrators and their tools. The data is biased toward Tasktop's customers and prospects, who tend to be 500 enterprise IT organizations seeking integration across one or more tools.
Tasktop has collected 308 of these diagrams. Figure 1 shows some of them. They're a fascinating window into the ground truth of enterprise toolchains. As such, they might inform future efforts in the collection of software delivery data in interesting ways. Here, I provide a very high-level overview of what we learned from them. A more detailed analysis will appear in my upcoming book Project to Product.
The diagrams provide a moment-in-time summary of each tool in the value stream and information on what the key artifacts captured in each tool are, as well as how they are or should be connected. The diagrams do not exhaustively list all the tool repositories in an organization or all the artifact types. Nor do they provide information about the data in those tools-for example, the number and types of defects. But they do provide the ground truth about the composition of these organizations' enterprise IT toolchains.
What the Data Revealed
There could be relevant tools outside this set. For example, these organizations have only recently been reporting vulnerability tracking tools as part of their DevOps tool-chains. A tool's absence from the results doesn't mean that it wasn't present, just that it wasn't considered for inclusion in the organization's view of the connected value stream at that time.
The diagrams were sourced from Tasktop customers and prospects defining what tools and artifacts they wanted to connect. The majority of the diagrams came from enterprise IT organizations in the Fortune 1000. Table 1 shows the industry breakdown.
Table 2 lists the types of tools used. As expected, agile-planning and application lifecycle management (ALM) tools dominated, but IT service management, project portfolio management, and requirements management also formed a key part of the toolchains. Requirements management tools continued to see significant use, even in the age of agile and DevOps. In contrast, initiatives to connect customer relationship management (CRM) and security tools were still rare. Altogether, the dataset included the use of 55 tools.
Even more interesting is what information was tracked in the tools. Table 3 provides insight into the artifacts created and thus the types of work. At a high level, imagine these artifacts corresponding to the widgets that flow through the various tools that perform software delivery. In an upcoming article, I'll discuss the relevance of these various types of artifacts.
Combining the data from Tables 2 and 3, we observed that the artifacts spanned multiple tools. For example, features were tracked across agile, ALM, requirements management, and sometimes IT service management tools. We interpreted this as another indication that the number of tools and their specialization in large-scale agile and DevOps environments are growing. However, the types of artifacts being stored in those tools (see Table 3) is considerably smaller, and the artifacts tend to span multiple tools. For example, a single defect can span agile, ALM, requirements management, and IT service management tools.
Some of the most interesting findings are in Table 4. We see that only 1.3 percent of organizations used a single tool. More interestingly, 69.3 percent of the organizations were connecting artifacts across three or more tools. The more surprising finding was that more than 42 percent of the organizations needed to integrate four or more tools, indicating the complexity involved in developing large-scale enterprise software. It also supports the notion that specialization of roles in software development is common.
This is the fifth blog in a series promoting the genesis of my book Project To Product If you missed the previous blogs, . To ensure you don't miss any further content, you can receive future articles and other insights delivered directly to your inbox by signing up to the Project To Product newsletter.
A version of this article was originally published in the May/June 2018 issue of IEEE Software: M. Kersten, "Mining the Ground Truth of Enterprise Toolchains," IEEE Software, vol. 35, no. 3, pp. 12-17, ©2018 IEEE doi: 10.1109/MS.2018.2141029 - Original article
- N. Forsgren and M. Kersten, "DevOps Metrics," , vol. 15, no. 6, 2018; queue.acm.org/detail.cfm?id53182626.
- M. Kersten and G.C. Murphy, "Using Task Context to Improve Programmer Productivity," Proc. 14th ACM SIGSOFT Int'l Symp. Foundations of Software Eng. (SIGSOFT/FSE 06), 2006, pp. 1-11; dl.acm.org/citation.cfm?id51181777.
- M. Kersten, "The End of the Manufacturing-Line Analogy," IEEE Software, vol. 34, no. 6, pp. 89-93.
- M. Kersten, "A Cambrian Explosion of DevOps Tools," IEEE Software, vol. 35, no. 2, pp. 14-17.
Published at DZone with permission of Mik Kersten , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.