Measuring Metrics in Open-Source Projects
Measuring Metrics in Open-Source Projects
As open-source projects grow in use and popularity, many of their maintainers face a challenge to understand how and how many people use their tools.
Join the DZone community and get the full member experience.Join For Free
For SaaS and proprietary software projects and products, tracking what users do with, and how they use your software, is a standard process. The level of tracking, its anonymity, and how it functions, are topics for another article, but it helps product owners directly relate development team activity to business value.
Open Source and open source projects have become an increasingly normal mode of operation for many companies, small and large. Other software (open and proprietary alike) use open source components, and to certain degrees, rely on the continued life of specific open source projects. However, for the maintainers and contributors to these open source projects, it's often hard to get the same level of feedback about their project usage, get a sense of the project's health, and prioritize future features and bug fixes.
Some of these challenges are technical. If a project is open-source, then someone else using it can fork it, change it, and implement it in a variety of ways that mean any inbuilt telemetry becomes unreliable.
There's also the issue of the dependency graph that plagues many open source projects. Your project has a smaller user base, but a much larger project uses your project as a dependency, so do you track their usage metrics as well?
Telemetry and statistical collection are also contentious issues in the open-source community. Contributors to open source projects often have ethical dislikes of tracking, statistics gathering, and intentionally leave them out of their projects. Depending on the project implementation, there's also a likelihood that an end-user blocks any form of "phoning home" anyway.
In summary, open-source projects may be popular, sometimes more popular than their commercial, closed source counterparts. Still, they often have a web of issues that makes it difficult for them to know how many users they have, what users are doing with their project, and what future needs they might have.
I spoke with a handful of thought leaders in the open-source metrics space, from people responsible for tracking and analyzing metrics to maintainers of tools focussed on helping open source projects.
These interviewees were:
- Kathy Reid, formerly director of developer relations at the Mycroft project, and president of Linux Australia
- Georg J.P. Link, director of sales at Bitergia, and co-founder of the CHAOSS project
- Daniel Izquierdo Cortázar, Co-Founder at Bitergia, and director of the InnerSource Commons Foundation
In this article, I summarize their advice, and the advice of others to help open source maintainers out there focus on what is important to their community.
As Kathy put it:
As W. Edwards Deming is famous for claiming, "If you can't measure it, you can't manage it. Without data, you're just a person with an opinion." If you want to influence the trajectory that an open-source project takes, then you need to understand where it's headed, and how quickly.
Metrics help you evaluate the impact of any initiatives you're implementing, and are a key input into strategic investment. If you're a small project with a limited budget, do you invest in your CI/CD pipelines, documentation, or in your community management? A good metrics framework helps you answer those questions.
It's impossible to know what success is until you define what your project priorities are, why they are priorities, and how you plan to measure them.
Beware of easy-to-measure "vanity metrics" that typically represent large numbers, make your project sound and look good, but are fundamentally meaningless to your project and it's future sustainability. You also don't need to track every statistic you can get your hands on but give you a high signal to noise ratio, or meaningless statistics that tell you nothing. Choose metrics that help you understand the story behind them and the journey that a user or a project is following. It's important to reach these decisions as a team or core community, and that everyone (as much as possible) agrees that you should have these goals, and why. As with any form of group dynamic, if people are not on board, it will be a challenge to make it happen.
Try and aim for some concrete aims for your project, and separate the goals of people using the project, and those maintaining the project, as these are potentially different goals.
Maybe your goals are:
- A certain number of community-run meetups and webinars around the world focused on your project
- How many other projects use your project as a dependency
- Help people replace an expensive and legally binding tool with an open-source option
- Increase internal developer productivity by creating a tool they know how it works and can make changes as needed without external help
- Your organization's reputation in open source communities
- To encourage contributions from new contributors (to open source and your project)
- Diversity of contributions and community
- To attract and keep a consistent body of reliable project contributors
- Perhaps with an eye on hiring them if you're a company stewarding a project
- Justifying open-source contributions from your developer team to senior management
- Setting how responsive you are to issues and contributions
Find Reliable Statistics Sources
With goals in place, you are in a better position to find data sources that match your goal. For example, many people take a high download/clone count as a good sign, but it could mean that everyone downloads, runs once, and gets nowhere. Find ways to get statistics on the goals that are important to you, and decide how often you collect and review them. There are some obvious and less obvious sources of numbers, including:
- GitHub/GitLab provided statistics
- PR/MR open time to close/merge time
- Issue open time to close time (And was it a satisfactory or unsatisfactory outcome)
- Code quality metrics, such as code coverage and test suites. These aren't statistics, but they can help your community contribute and give them a good sign you take your project seriously
- How many other projects or commercial offerings rely on your project
- Mentions in the media, Hacker News, Reddit, social media, etc., and the positivity of the coverage
- Community events organized about your projects
- External tutorials and courses. These could be a sign your project is popular, or that it's hard to understand
- Mailing lists subscribers, forum posts, user groups, support channels, surveys, or "are you using us" responses
- Project website visits
3. Frame These Statistics
Kathy posed two ways to look at and interpret the statistics you now have. One is the People, Product, Process, Partners perspective.
- People: Looks at the numbers relating most directly to people such as community volume and productivity, which social channels have the most impact
- Product: Looks at the velocity and maturity of your project. Statistics such as PRs, issues, forks, feature requests closed
- Process: Looks at the maturity and the fitness for purposes of your processes. What's the experience for new contributors, what's your review process, and how long do issues sit before someone responds?
- Partners: Look at an ecosystem view of your product such as statistics on dependencies, and projects that depend on you
Another is a lifecycle perspective.
Open source projects are different from proprietary projects because of their community and volunteer efforts that surround them, and both of these have lifecycles. Contributors typically follow a sequence of steps from finding a project, to using it, to raising issues, to contributing to it. If you think of these steps as a contributor pipeline, you can use metrics to identify where you lose contributors, and where your pipeline needs improvement.
I don't want to focus too much on tooling as I'm sure many of you reading this know how (or know someone who does) to collect data via APIs and feeds and display them in dashboards. But there are also a handful of tools optimized for open source communities such as:
- The GitLab and GitHub project dashboards
- Package manager statistics
And for examples of projects handling metrics well, have a look at:
Opinions expressed by DZone contributors are their own.