DZone
Integration Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Integration Zone > Explore Deep in 4.6 Billion GitHub Events

Explore Deep in 4.6 Billion GitHub Events

Understand any GitHub project or quickly compare any two projects by digging deep into 4.6 billion GitHub events in real-time. Here are some ways you can play with it.

Max Liu user avatar by
Max Liu
·
Jul. 18, 22 · Integration Zone · Analysis
Like (2)
Save
Tweet
2.29K Views

Join the DZone community and get the full member experience.

Join For Free

4.6 billion is literally an astronomical figure. The richest star map of our galaxy, brought by Gaia space observatory, includes just under 2 billion stars. What does a view of 4.6 billion GitHub events really look like? What secrets and values can be discovered in such an enormous amount of data?

Here you go: OSSInsight.io can help you find the answer. It’s a useful insight tool that can give you the most updated open source intelligence, and help you deeply understand any single GitHub project or quickly compare any two projects by digging deep into 4.6 billion GitHub events in real-time. Here are some ways you can play with it.

Compare Any Two GitHub Projects

Do you wonder how different projects have performed and developed over time? Which project is worthy of more attention? OSSInsight.io can answer your questions via the Compare Projects page.

Let’s take the Kubernetes repository (K8s) and Docker’s Moby repository as examples and compare them in terms of popularity and coding vitality.

Popularity

To compare the popularity of two repositories, we use multiple metrics including the number of stars, the growth trend of stars over time, and stargazers’ geographic and employment distribution.

Number of Stars

The line chart below shows the accumulated number of stars of K8s and Moby each year. According to the chart, Moby was ahead of K8s until late 2019. The star growth of Moby slowed after 2017 while K8s has kept a steady growth pace.

The star history of K8s and Moby

Geographical Distribution of Stargazers

The map below shows the stargazers’ geographical distribution of Moby and K8s. As you can see, their stargazers are scattered around the world with the majority coming from the US, Europe, and China.

The geographical distribution of K8s and Moby stargazers

Employment Distribution of Stargazers

The chart below shows the stargazers’ employment of K8s (red) and Moby (dark blue). Both of their stargazers work in a wide range of industries, and most come from leading dot-com companies such as Google, Tencent, and Microsoft. The difference is that the top two companies of K8s’ stargazers are Google and Microsoft from the US, while Moby’s top two followers are Tencent and Alibaba from China.

The employment distribution of K8s and Moby stargazers

Coding Vitality

To compare the coding vitality of two GitHub projects, we use many metrics including the growing trend of pull requests (PRs), the monthly number of PRs, commits, and pushes, and the heat map of developers’ contribution time.

Number of Commits and Pushes

The bar chart below shows the number of commits and pushes submitted to K8s (top) and Moby (bottom) each month after their inception. Generally speaking, K8s has more pushes and commits than Moby, and their number grew stably until 2020 followed by a slowdown afterward. Moby’s monthly pushes and commits had a minor growth between 2015 and 2017, and then barely increased after 2018.

The monthly pushes and commits of K8s (top) and Moby (bottom)

Number of PRs

The charts below show the monthly and accumulated number of PRs of the two repositories. As you can see, K8s has received stable and consistent PR contributions ever since its inception and its accumulated number of PRs has also grown steadily. Moby had vibrant PR submissions before late 2017 but started to drop afterward. Its accumulated number of PRs reached a plateau in 2017, which has remained the case ever since.

The monthly and accumulated PR number of K8s (top) and Moby (bottom)

Developers’ Contribution Time

The following heat map shows developers’ contribution time for K8s (left) and Moby (right). Each square represents one hour in a day. The darker the color, the more contributions occur during that time. K8s have many more dark parts than Moby, and K8s’ contributions occur almost 24 hours a day, 7 days a week. K8s definitely has more dynamic coding activities than Moby.

Heat map of developers’ contribution time of K8s (left) and Moby (right)

Taken together, these metrics show that while both K8s and Moby are popular across industries worldwide, K8s have more vibrant coding activities than Moby. K8s is continuously gaining popularity and coding vitality while Moby is falling in both over time.

Popularity and coding vitality are just two dimensions to compare repositories. If you want to discover more insights or compare other projects you are interested in, feel free to visit the Compare page and explore it for yourself.

Of course, you can use this same page to deeply explore any single GitHub project and gain the most up-to-date insights about them. The key metrics and the corresponding changes are presented in a panoramic view. More in-depth analytics such as code changes by PR size groups and PR lines are also available. Explore it for yourself and you’d be surprised. Have fun.

Panoramic view of key GitHub metrics (K8s as an example)

Total PR number each month/PR groups (K8s as an example)

The number of lines of code change each month (K8s as an example)

Key Open Source Insights

OSSInsight.io does more than explore or compare repositories. It gives you historical, real-time, and custom open source insights. In this section, we’ll share some key insights into open source databases and programming languages. If you want to gain insights into other areas, you can explore the Insights page for yourself.

Note: If you want to get those analytical results by yourself, you can execute the SQL commands above each chart on TiDB Cloud with ease following this 10-minute tutorial.

Rust: the Most Active Programming Language

Rust was first released in 2012 and has been among the leading programming languages for 10 years. It has the most active repository with a total of 103,047 PRs at the time of writing.

Here's the SQL commands
SQL
 
SELECT
    /*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */
    programming_language_repos.name AS repo_name,
    COUNT(*)      AS num
FROM github_events
         JOIN programming_language_repos ON programming_language_repos.id = github_events.repo_id
WHERE type = 'PullRequestEvent'
  AND action = 'opened'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10


PR numbers of the leading programming languages

PR numbers of the leading programming languages

Go: the New Favorite and the Fastest Growing Programming Language

According to OSSInsight.io, 10 programming languages dominate the open source community. Go is the most popular with 108,317 stars, followed by Node and TypeScript. Go is also the fastest-growing language in popularity.

Here's the SQL commands
SQL
 
WITH repo_stars AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        repo_id,
        ANY_VALUE(repos.name) AS repo_name,
        COUNT(distinct actor_login) AS stars
    FROM github_events
         JOIN programming_language_repos repos ON repos.id = github_events.repo_id
    WHERE type = 'WatchEvent'
    GROUP BY 1
), top_10_repos AS (
    SELECT
        repo_id, repo_name, stars
    FROM repo_stars rs
    ORDER BY stars DESC
    LIMIT 10
), tmp AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        event_year,
        tr.repo_name AS repo_name,
        COUNT(*) AS year_stars
    FROM github_events
         JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id
    WHERE type = 'WatchEvent' AND event_year <= 2021
    GROUP BY 2, 1
    ORDER BY 1 ASC, 2
), tmp1 AS (
    SELECT
        event_year,
        repo_name,
        SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars
    FROM tmp
    ORDER BY event_year ASC, repo_name
)
SELECT event_year, repo_name, stars FROM tmp1WITH repo_stars AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        repo_id,
        ANY_VALUE(repos.name) AS repo_name,
        COUNT(distinct actor_login) AS stars
    FROM github_events
         JOIN programming_language_repos repos ON repos.id = github_events.repo_id
    WHERE type = 'WatchEvent'
    GROUP BY 1
), top_10_repos AS (
    SELECT
        repo_id, repo_name, stars
    FROM repo_stars rs
    ORDER BY stars DESC
    LIMIT 10
), tmp AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        event_year,
        tr.repo_name AS repo_name,
        COUNT(*) AS year_stars
    FROM github_events
         JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id
    WHERE type = 'WatchEvent' AND event_year <= 2021
    GROUP BY 2, 1
    ORDER BY 1 ASC, 2
), tmp1 AS (
    SELECT
        event_year,
        repo_name,
        SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars
    FROM tmp
    ORDER BY event_year ASC, repo_name
)
SELECT event_year, repo_name, stars FROM tmp1


The star growth trends of leading programming languages

The star growth trends of leading programming languages

Microsoft and Google: the Top Two Programing Languages Contributors

As world-renowned high-tech companies, Microsoft and Google take the lead in open source language contributions with a total of 1,443 and 947 contributors respectively at the time of writing.

Here's the SQL commands
SQL
 
SELECT
    /*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */
    TRIM(LOWER(REPLACE(u.company, '@', ''))) AS company,
    COUNT(DISTINCT actor_id)                 AS num
FROM
    github_events github_events
    JOIN programming_language_repos db ON db.id = github_events.repo_id
    JOIN users u ON u.login = github_events.actor_login
WHERE
    github_events.type IN (
        'IssuesEvent', 'PullRequestEvent','IssueCommentEvent',
        'PullRequestReviewCommentEvent', 'CommitCommentEvent',
        'PullRequestReviewEvent'
    )
    AND u.company IS NOT NULL
    AND u.company != ''
    AND u.company != 'none'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;SELECT
    /*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */
    TRIM(LOWER(REPLACE(u.company, '@', ''))) AS company,
    COUNT(DISTINCT actor_id)                 AS num
FROM
    github_events github_events
    JOIN programming_language_repos db ON db.id = github_events.repo_id
    JOIN users u ON u.login = github_events.actor_login
WHERE
    github_events.type IN (
        'IssuesEvent', 'PullRequestEvent','IssueCommentEvent',
        'PullRequestReviewCommentEvent', 'CommitCommentEvent',
        'PullRequestReviewEvent'
    )
    AND u.company IS NOT NULL
    AND u.company != ''
    AND u.company != 'none'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;


Companies who contribute the most to programing languages

Companies who contribute the most to programming languages

Elasticsearch Draws the Most Attention

Elasticsearch was one of the first open-source databases. It is the most liked database with 64,554 stars, followed by Redis and Prometheus. From 2011 to 2016, Elasticseasrch and Redis shared the top spot until Elasticsearch broke away in 2017.

Here's the SQL commands
SQL
 
WITH repo_stars AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        repo_id,
        ANY_VALUE(repos.name) AS repo_name,
        COUNT(distinct actor_login) AS stars
    FROM github_events
         JOIN db_repos repos ON repos.id = github_events.repo_id
    WHERE type = 'WatchEvent'
    GROUP BY 1
), top_10_repos AS (
    SELECT
        repo_id, repo_name, stars
    FROM repo_stars rs
    ORDER BY stars DESC
    LIMIT 10
), tmp AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        event_year,
        tr.repo_name AS repo_name,
        COUNT(*) AS year_stars
    FROM github_events
         JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id
    WHERE type = 'WatchEvent' AND event_year <= 2021
    GROUP BY 2, 1
    ORDER BY 1 ASC, 2
), tmp1 AS (
    SELECT
        event_year,
        repo_name,
        SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars
    FROM tmp
    ORDER BY event_year ASC, repo_name
)
SELECT event_year, repo_name, stars FROM tmp1WITH repo_stars AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        repo_id,
        ANY_VALUE(repos.name) AS repo_name,
        COUNT(distinct actor_login) AS stars
    FROM github_events
         JOIN db_repos repos ON repos.id = github_events.repo_id
    WHERE type = 'WatchEvent'
    GROUP BY 1
), top_10_repos AS (
    SELECT
        repo_id, repo_name, stars
    FROM repo_stars rs
    ORDER BY stars DESC
    LIMIT 10
), tmp AS (
    SELECT
        /*+ read_from_storage(tiflash[github_events]) */
        event_year,
        tr.repo_name AS repo_name,
        COUNT(*) AS year_stars
    FROM github_events
         JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id
    WHERE type = 'WatchEvent' AND event_year <= 2021
    GROUP BY 2, 1
    ORDER BY 1 ASC, 2
), tmp1 AS (
    SELECT
        event_year,
        repo_name,
        SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars
    FROM tmp
    ORDER BY event_year ASC, repo_name
)
SELECT event_year, repo_name, stars FROM tmp1


The star growth trend of leading databases

The star growth trend of leading databases

China: the Number One Fan of Open Source Databases

China has the most open source database followers with 11,171 stargazers of database repositories, followed by the US and Europe.

Here are the SQL commands
SQL
 
select upper(u.country_code) as country_or_area, count(*) as count, count(*) / max(s.total) as percentage
from github_events
use index(index_github_events_on_repo_id)
left join users u ON github_events.actor_login = u.login
join (
    -- Get the number of people has the country code.
    select count(*) as total
    from github_events
    use index(index_github_events_on_repo_id)
    left join users u ON github_events.actor_login = u.login
    where repo_id in (507775, 60246359, 17165658, 41986369, 16563587, 6838921, 108110, 166515022, 48833910, 156018, 50229487, 20089857, 5349565, 6934395, 6358188, 11008207, 19961085, 206444, 30753733, 105944401, 31006158, 99919302, 50874442, 84240850, 28738447, 44781140, 372536760, 13124802, 146459443, 28449431, 23418517, 206417, 9342529, 19257422, 196353673, 172104891, 402945349, 11225014, 2649214, 41349039, 114187903, 20587599, 19816070, 69400326, 927442, 24494032) and github_events.type = 'WatchEvent' and u.country_code is not null
) s
where repo_id in (507775, 60246359, 17165658, 41986369, 16563587, 6838921, 108110, 166515022, 48833910, 156018, 50229487, 20089857, 5349565, 6934395, 6358188, 11008207, 19961085, 206444, 30753733, 105944401, 31006158, 99919302, 50874442, 84240850, 28738447, 44781140, 372536760, 13124802, 146459443, 28449431, 23418517, 206417, 9342529, 19257422, 196353673, 172104891, 402945349, 11225014, 2649214, 41349039, 114187903, 20587599, 19816070, 69400326, 927442, 24494032) and github_events.type = 'WatchEvent' and u.country_code is not null
group by 1
order by 2 desc;


The geographical distribution of open source database stargazers

The geographical distribution of open source database stargazers

OSSInsight.io also allows you to create your own custom insights into any GitHub repository created after 2011. 

Database GitHub Open source Event Insight (email client) k8s trends

Published at DZone with permission of Max Liu. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Debugging a Wordle Bug
  • Software Methodologies — Waterfall vs Agile vs DevOps
  • Top 5 Datadog Integrations To Improve the Efficiency of Tech Teams
  • Update on Closures Coming to Java 7

Comments

Integration Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo