Over a million developers have joined DZone.

[DZone Research] Python and R in Big Data and Data Science

DZone's Guide to

[DZone Research] Python and R in Big Data and Data Science

In this post, we take a look at some DZone research data around the popularity of the Python and R languages in the field of big data.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.


For the 2018 DZone Guide to Big Data, we surveyed 540 software and data professionals to get their thoughts on various topics surrounding the field of big data and the practice of data science. In this post, we focus on the extreme popularity of the Python language in the field. 


Python has been moving slowly towards the title of “most popular language for data science” for years now, and the language to beat has been R. R has been extraordinarily popular for data-heavy programming for some time as an open source implementation of S, a language specifically designed for statistical analysis. And while R still maintains popularity (in the TIOBE index, it moved from the 16th place ranking January 2017 to the 8th place ranking in 2018), Python’s use in data science and data mining projects has been steadily increasing. Last year, respondents to DZone’s Big Data survey revealed that Python had overcome R as the predominant language used for data science, though its lead over R was statistically insignificant, and therefore didn’t quite make it to “champion status.” This was mentioned in last year’s research findings as being consistent with trends in other available research on Python and R’s use in data science: R is still popular for data/statistical analysis, but Python has been catching up.

This year, DZone’s Big Data survey showed a significant difference between the use of R and Python for data science projects: R usage decreased by 10%, from 60% to 50%, among survey respondents in the last year, while Python increased 6%, from 64% to 70%. This means 20% more respondents this year use Python for data science than respondents who use R. While Python was not created specifically for data analysis, its dynamic typing, easy-to-learn syntax, and ever-increasing base of libraries has made it an ideal candidate for developers to start delving into data science and analysis more comfortably than they may have been able to in the past.


How do these findings hold up against the wider development community? If we consult Stack Overflow's list of 'Most Loved, Dreaded, and Wanted Languages,' for 2018, these trends do, in fact, seem to hold. In this report, 68% (the third largest percentage) reported Python as the most loved language, and 25% (the largest percentage) reported Python as the most wanted language. Conversely, only 49% reported R as a 'most wanted' language, and 50% reported R as one of the 'most dreaded languages.' Thus, it seems that both in the big data and larger developer communities Python is on the rise and R is stagnant if not falling in popularity. 

What are your thoughts on these two powerful and popular languages? 

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

big data ,dzone research ,python ,r ,data science

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}