How To Build GitHub Activity Dashboard With Open-Source
In this article, we will be leveraging an open-source data integration platform and an open-source way to learn from data, to build a GitHub activity dashboard.
Join the DZone community and get the full member experience.Join For Free
In this article, we will be leveraging Airbyte - an open-source data integration platform and Metabase - an open-source way for everyone in your company to ask questions and learn from data - to build the GitHub activity dashboard above.
Airbyte provides us with a rich set of source connectors, and one of those is the GitHub connector which allows us to get data off a GitHub repo. We are going to use this connector to get the data of the Airbyte repo and copy them into a Postgres database destination. We will then connect this database to Metabase in order to create the activity dashboard. In order to do so, we will need:
- Docker Compose
- Postgres Database
- GitHub access token
Step 1: Replicating Data from GitHub to Postgres with Airbyte
Setting up Airbyte
You can skip this step if you already have Airbyte on your machine.
To set up Airbyte on your machine, make sure you have Docker and Docker compose set-up, as well as git. Then, open a terminal, and go to a location you want to download Airbyte in and run:
git clone https://github.com/airbytehq/airbyte.git
You will need to go into the cloned Airbyte repo by running cd Airbyte and then you run:
Or if you are on the newest version of the Docker CLI you can run:
docker compose up
The above command will create and start the Airbyte containers. After it's done, you can access Airbyte at http://localhost:8000/ (you can go ahead and set up your preference, then leave the Airbyte web app open as we will come back to it shortly)
Setting Up Postgres Database
This database will be the destination for the data coming in from GitHub. To set this up, we will be running a Postgres container via docker like so:
docker run --rm --name github-destination -e POSTGRES_PASSWORD=password -p 3003:5432 -d postgres
If you are running the above command for the first time, it will download the Postgres image from the Docker Hub registry and then run it as a container with the name of github-destination. We are also setting the database password by passing an environment variable of POSTGRES_PASSWORD to a value of the password.
Furthermore, we are also binding exposing the container’s 5432 port to our host machine on port 3003. Finally, we are running the container in the background with the -d flag. Now that we have set up the destination database, let's head over to the Airbyte web app and create a connection from the Airbyte GitHub source to our Postgres database.
Creating Airbyte Connection
Back in the Airbyte web app in your browser, click on the new source button in the top right corner of the app to go to the page to add a new Airbyte source. Enter the name github-source as the source name and click the drop-down and select Github connector as the source type. After selecting the GitHub source type, you will be presented with two text boxes.
The first is to enter a repository you want. In this box, type in airbytehq/airbyte, and then, in the second box, you will need to provide a GitHub access token which you can obtain from here. Make sure you grant the token the repo and write:discussion permissions. After you've filled all fields, hit the setup source button.
If the setup was successful, you will be taken to the destination screen where you will add a new destination.
Click the add destination button, and, in the drop-down that follows, click add a new destination. Then, you will see a page to add the destination name. Type in the name we gave the Postgres container we created earlier (github-destination), and then choose Postgres as the destination type. After, you will be presented with some text boxes to enter the database connection details. Enter the values for the Postgres container we created earlier:
- Host - localhost
- Post - 3003
- Schema - public (leave default)
- Database - Postgres
- Password - password
- Username - Postgres
Then click on the basic normalization toggle button to check it on as we want Airbyte to normalize the data coming in from GitHub. Overall the UI should look like this:
Then click on the Setup destination button. If your credentials are all good for the database, the Postgres destination would have been set, and now you will need to make the connection from the source (GitHub) to the destination (Postgres).
You should check the boxes that are checked in the screenshot below, and then choose how often Airbyte will attempt to replicate data to be every hour in the Sync frequency dropdown. Then, click on the Setup connection button. You will be taken to the source page, click on the source and you will see the status of your sync which takes a while to complete.
When done, you will see the status change from running to Succeeded. Also, the byte count will be referring to the byte of data Airbyte has pulled from GitHub into your Postgres database.
That wasn’t a lot of work, was it? You can pat yourself in the back as you just synced data from GitHub to a Postgres database. Let's move on to connecting that database to Metabase, so we can start creating our dashboard.
Step 2: Connecting the PostgreSQL Database to Metabase
Metabase is an open-source analytic tool that you can get started working within a couple of ways; using the .jar file, installing it as a Mac app, or using a Docker container.
For this tutorial, we will be using the Mac app(you can alternatively look up the installation that suits you here).
If you are on a Mac, you can download the Metabase app from here. After download, launch the app and complete registration for an account and you will see the Metabase dashboard:
Setting Up a Postgres Database in Metabase
In order to set up our database, we will click on the cog(settings) icon on the menu bar of the Metabase app and choose the admin option to be taken to the admin view where we can add a database:
Click on the add a database button and you will be presented with a form where you should select PostgreSQL as your database of choice and then fill out the connection parameters which will be the one for the PostgreSQL database we created to hold the data from GitHub
Fill out the details to match the credentials of the PostgreSQL database we created earlier.
Afterward, hit save when you are done entering the database credentials and your database would have been fully loaded onto Metabase ready for us to start creating our dashboard.
Step 3: Creating Dashboards in Metabase
Metabase is based on asking questions on Data, so to build our GitHub Dashboard we will be asking the following questions:
- New stargazers per day
- New issues contributed
- Number of PR contributors per day
- New comments from non-team members per day
- New issues from non-team members per day
New Stargazers per Day
Note: We mostly need to work with datetime type in order to find the evolution metrics for our dashboard, so if you noticed that a field you need is not in the appropriate type like starred_at being a VARCHAR type instead of data time, you can go to the Admin view in Metabase and click on the Data Model option and tell Metabase to cast that field
So the general workflow we will be using with Metabase is to click on the Ask Question button on top of the menu bar. Then pick the type of question you want to ask. We will start off with a simple question:
Then you choose the data you want to work with. If everything is configured right, a data source called Airbyte_GitHub should show up. Then let's select the table we want to work with, so since for this first question, we want the number of stargazers we select the Stargazers table.
You will then see the data in the table chosen. For this first one, we will see the data in the Stargazers table.
To compute the answer to our first question, hit the Summarize button and group by Starred At and summarize by Count and then you’d have your very first question answered: the number of new stargazers in a day.
Hit the blue Save button in the top right to save the question and make sure you are visualizing in the line mode. When saved, Metabase will prompt you to add it to a Dashboard, which is exactly what we want!
Click on create a new dashboard, Name your Dashboard, and hit create. Then you can size your dashboard card to your suiting, hit save and we have our first question answered!
For the rest of the questions we need to ask, the steps are pretty similar so we will just mention the table we are using and show the visualization as well as the settings we used in arriving there.
Evolution of the number of PR contributors
Table: Pull Requests
Evolution of the number of issue contributors
Evolution of the number of new comments from non-team members
We will be filtering with the Author Association field to filter out any comment from collaborators.
Evolution of the number of new issues from non-team members
In this article, we have worked through getting data from a GitHub repo using the GitHub Airbyte connector and storing that data in a PostgreSQL database. We then set up Metabase and asked questions to visualize the data.
Here is the finished GitHub dashboard with our visualizations on Metabase:
Here are other questions you can get answers to using the same process:
- evolution of the # new comments from non-team members
- evolution of the # new issues from non-team members
- evolution of the # new PRs from non-team members
- time to 1st comment for non-team members interactions (50th, 90th percentile)
- time until 1st review for a PR (50th, 90th percentile)
Published at DZone with permission of John Lafleur. See the original article here.
Opinions expressed by DZone contributors are their own.