Virtually all software you are developing today is part of a software supply chain whether you realize it or not. Each time as a developer that you use a library, you are entering into a software supply chain. Software supply chains may be highly contractual, where a manufacturer, say of an automobile, contracts software from a number of different suppliers with a contract specifying precisely what the contracted software must do. We refer to these software supply chains as “tight.”
At the opposite end of the spectrum are software supply chains arising from open source software using other open source software. Both tight and loose supply chains require information to flow back and forth across the supply chain. Together with Dr. Marc Palyart, I researched communication patterns in loose software supply chains found on GitHub to learn more about the requirements of information exchange in software supply chains.
To study the communication patterns, we selected projects on the GitHub hosting site. These repositories had a number of shared characteristics:
The figure below shows how applying these characteristics narrowed the more than 10 million projects available on GitHub down to 1,227 for study.
We used these repositories to investigate a number of questions about social interactions between developers of a user repository and developers of a library repository.
For instance, we asked: “How often does the use of a library lead to social interactions?” Analyzing pairs of repositories formed from one GitHub repository that uses a library provided by another GitHub repository, we found that:
This is good news for supply chains as in almost half of the libraries, setting up loose software supply chain did not require the user to report a bug, to ask questions or to request features from the supplier (aka the library repository developers).
Unexpected Social Interactions
We expected that in the vast majority of the cases, developers of the user repository would start using a library and then might engage socially with developers of the library repository, perhaps by reporting a bug. We were surprised to find that in 39% of the pairs we studied, the opposite happened: Developers of the user repository engaged with the developers of the library repository before trying to use the library. At least in some cases, developers of the user repository were asking about potential new features for the library before entering into the supply chain.
We also considered whether the social interactions were forward or backward. Following up on our expectation that developers of the user repository were more likely to engage with developers of a library repository rather than vice versa, we classify such a social contribution as forward. A backward social contribution consists of a developer of the library repository making a comment, adding an issue, making a pull request on the user repository. We were very surprised to find that 30% of the repository pairs studied had both forward and backward social contributions For example, in one pair, as expected developers of the user repository made social (forward) contributions to the library repository but then surprisingly, later a developer of the library repository, who did not have any prior interaction with the user developers, made a pull request on the user repository to update the library to a newer version.
We’re fortunate that open source hosting systems such as GitHub make such research possible through built-in issue tracking systems. This particular study of social interactions between GitHub repositories demonstrates that information exchange in a loose software supply chain, to our surprise, is bi-directional. Users (manufacturers) make requests of libraries (suppliers), but, much more often than was expected, suppliers also need to provide information back to users. As such, software development organizations should consider how and when they must engage in two-way supply chain communication, and should ensure those communications are efficient.