DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • How to Build a Pokedex React App with a Slash GraphQL Backend
  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • C# Applications Vulnerability Cheatsheet
  • Building REST API Backend Easily With Ballerina Language

Trending

  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • How Can Developers Drive Innovation by Combining IoT and AI?
  • How to Build Real-Time BI Systems: Architecture, Code, and Best Practices
  • Customer 360: Fraud Detection in Fintech With PySpark and ML
  1. DZone
  2. Data Engineering
  3. Databases
  4. Querying Pull Request Data from GitHub

Querying Pull Request Data from GitHub

In this article we will use MergeStat and Arctype to monitor contributions to our projects.

By 
Bonnie ! user avatar
Bonnie !
·
Mar. 09, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
3.1K Views

Join the DZone community and get the full member experience.

Join For Free

To make sure that bugs do not reach the end-user of your product, you need to do code reviews. In code review, metrics of pull requests matter a lot because they provide data on how well you are shipping. Software developers can use pull requests metrics to understand the team dynamics and act appropriately to correct behaviors before things get out of hand.

What Is Code Review?

Code review is when a peer or a senior developer examines a developer's code or a pull request. Code reviews help developers discover common bugs faster and reduce the work required to optimize code later.

Some of the benefits of code review include:

  1. Consistency in design and implementation.
  2. Optimizing code for better performance.
  3. Saving time and money.
  4. Monitoring project quality and requirements.
  5. Streamlining the development process.
  6. Collaboration and sharing of new techniques.

This article will help you learn how to use MergeStat to get pull request data from a GitHub repo.

We will start by fetching data from a GitHub repo using MergeStat. Then we will connect to SQLite using Arctype. Finally, we will explore the pull requests data by running some SQL queries on Arctype.

Arctype is a database management and visualization tool. Using Arctype, you will learn how to execute or run SQL queries to come up with metrics such as:

  • How many unanswered issues does your project have?
  • How many pull requests were merged in the last month?
  • How many unapproved merged pull requests have we had?

Before querying GitHub repo data in Arctype, we will get the repo data using the MergeStat web app by querying GitHub API. Let's get started.

Getting GitHub Repo Data by Querying the GitHub API Using MergeStat

In this section, you will learn how to use the MergeStat web app to query a GitHub API and get all the pull requests in a GitHub repository. To query the GitHub API, you must provide an authentication token. You can create a personal access token by following these instructions.

Before we proceed, there is something you need to know. All the API requests to a GitHub API are rate limited. The SQL query we will be running uses the GitHub GraphQL API (v4), which rate limits based on the "complexity" of GraphQL queries. The more fields/relations in your GraphQL, the higher the "cost" of a single API request, and the faster you may reach the rate limit.

Unfortunately, it is hard to know ahead of time what an excellent client-side rate limit is. By default, each SQL query will fetch 100 items per page and permit 2 API requests per second.

With that in mind, navigate to the MergeStat web app.

MergeStat web app

To feel the magic of the MergeStat web app, click the blue "Execute" button at the bottom on the right side of the screenshot above. After the execution finishes running, scroll down a bit, and you should see the results below.

MergeStat results

Let's now run a SQL query that returns all the React Codebase repository pull requests on GitHub. On the MergeStat web app, click the 'Query Settings' button at the bottom on the left side of this screenshot.

MergeStat query settings

After clicking the 'Query Settings', make sure you are logged in to the MergeStat web app to add the GitHub repo you want to query. Follow these instructions to get GitHub Access Token and paste it in its field as shown below. Enter a "GitHub Per Page" limit of 10 to avoid running to errors when the query is executing. You can leave the GitHub Rate Limit field blank. Don't forget to save your inputs.

Mergestat query setting configurations

After saving your inputs, execute the following SQL query on the MergeStat web app.

SQL
 
SELECT * FROM GitHub_repo_prs('facebook/react');


You should get results once the execution is done, as shown below.

MergeState query

Let us now repeat what we did above, but in this case, after installing the MergeStat app on our computers. We will then use the installed MergeStat app to query a GitHub API and get all the pull request data in React Codebase by Facebook. After that, we will use Arctype to visualize the pull request data using charts.

How to Install the MergeStat Desktop App

One of the easiest ways to install MergeStat on your system is by using Homebrew. First, make sure Homebrew is installed in your system, and then run the following commands on your command line.

Shell
 
brew tap mergestat/mergestat
brew install mergestat


After MergeStat is installed, you must add your GitHub token and GitHub per page limit as environment variables. If you are on Linux, you can follow these instructions on adding environment variables.

The environment variables should look as shown below.

Properties files
 
GitHub_TOKEN=’your_GitHub_token’
GitHub_PER_PAGE=10 


MergeStat will look for a GitHub_TOKEN environment variable when executing and use it for authentication. Additionally, GitHub_PER_PAGE expects an integer between 1 and 100 to set how many items are fetched per page in API calls that paginate results.

Let us now query the GitHub API using the MergeStat app to get the pull request data of the React codebase by Facebook.

On your command line, run the command shown below.

Shell
 
mergestat export pr-file.db -e GitHub_repo_prs \
    -e "SELECT * FROM GitHub_repo_prs('facebook/react')"


The command above will create a file called pr-file.db in your current directory.

Exporting the React Codebase Pull Requests Data From GitHub

To query the React codebase pull requests data, we will use Arctype. You can install the Arctype app on your system here.

Once you have Arctype installed, you will be redirected to a page to add your database credentials. In this case, select SQLite as your connection.

Arctype database connector

Once you have selected the SQLite database as your connection, you will be redirected to a page to name your SQLite connection and add an SQLite file to Arctype.

Arctype adding database credentials

The SQLite file that you need to add is the pr-file.db file, which was downloaded to your directory when you ran the command below on your command line.

Shell
 
mergestat export pr-file.db -e GitHub_repo_prs \
    -e "SELECT * FROM GitHub_repo_prs('facebook/react')"


Click the "Choose SQLite File" button and navigate to the folder that has the pr-file.db file. In my case, the file is in a folder called mysql-server-main, as shown below.

Arctype SQLite file choice

Once you have selected the file, click "Open" to add the file to Arctype. Then test the connection, and don't forget to click the save button. If everything is okay, you will be redirected to your workspace. On your workspace, click the table to explore it.

Arctype running a query

Once you click on the table name, you should see a table containing all pull requests data from the React codebase repo on GitHub.

Table pull requests

Let us now run some queries in Arctype to check out some different metrics from the data.

Identifying Unanswered Issues and Pull Requests

Unanswered issues and pull requests on GitHub repos are no fun if you are the one asking for help. However, identifying unanswered issues can also be valuable to maintainers to ensure that they are addressing user concerns.

In this step, we will run a query on Arctype to identify stale issues or pull requests in React Codebase repository on GitHub. In Arctype, run the following SQL query that will return the oldest 25 issues created more than 30 days ago, with no comments, and remain open.

SQL
 
SELECT
   title, author_login, comment_count, created_at, url
FROM GitHub_repo_prs
WHERE
   created_at < date('now', '-30 days')
   AND (merged = 0 OR closed = 0)
   AND comment_count = 0
ORDER BY created_at ASC
LIMIT 25


Once the SQL query is done running, you should see a table like the one shown below, showing the oldest unanswered pull requests. "Add explanation if rendered a nonvalid component" was created in 2013 and never received any comments. It can probably be removed!

Arctype pull requests

Finding Unapproved Merged Pull Requests

Allowing repository administrators to merge pull requests without a review can be a helpful shortcut for "emergency" hotfixes that avoid a potentially time-consuming code review cycle. However, too many unapproved merges are probably a sign of something wrong.

In this step, we will run a SQL query on Arctype that will list all unapproved pull requests merged into the main branch of React Codebase repository on GitHub.

SQL
 
SELECT
   number,
   title,
   date(created_at) AS created_at,
   date(merged_at) AS merged_at,
   author_login,
   review_decision,
   merged
FROM GitHub_repo_prs -- set to your own repo
WHERE
   merged = true
   AND review_decision <> 'APPROVED'
   AND base_ref_name = 'main' -- set to your own branch
ORDER BY created_at DESC


Once the SQL query is done running, you should have a table as shown below. These issues both seem like behind-the-scenes internal stuff. To be sure, we can pop the full text of the title open with cell inspector.

Arctype cell inspector

Pull Requests Merged in the Past 30 Days

When a project has more than a handful of regular contributors, it can be challenging to see all the changes.

In this step, we will run a query on Arctype that will show all pull requests that have been merged in the past 30 days on React codebase repository on GitHub.

SQL
 
SELECT
  base_repository_name,
  title,
  number,
  url,
  author_login,
  created_at,
  merged_at
FROM GitHub_repo_prs
WHERE
   merged_at > date('now', '-30 days') -- replace with the time period you care about
   AND merged = true


Once the SQL query is done running, you should have a table as shown below. Notice the 62 rows count at the very bottom. This is a very active repo, averaging 2 merges a day!

Repo merges

Conclusion

Every time someone writes code and wants to merge it into the main branch, they should request that another developer take a look and offer feedback. A code review helps developers to maintain consistency between design and implementation styles across man team members.

However, as a code reviewer, you might miss some pull requests. Using tools like Mergestat and Arctype can be a great way to review pull requests in depth and take action on your findings.

Requests GitHub pull request Database Data (computing) sql app code style API

Published at DZone with permission of Bonnie !. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Build a Pokedex React App with a Slash GraphQL Backend
  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • C# Applications Vulnerability Cheatsheet
  • Building REST API Backend Easily With Ballerina Language

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!