Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

[DZone Research] Python and R in Big Data and Data Science

DZone's Guide to

[DZone Research] Python and R in Big Data and Data Science

In this post, we take a look at some DZone research data around the popularity of the Python and R languages in the field of big data.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.

Introduction

For the 2018 DZone Guide to Big Data, we surveyed 540 software and data professionals to get their thoughts on various topics surrounding the field of big data and the practice of data science. In this post, we focus on the extreme popularity of the Python language in the field. 

Python

Python has been moving slowly towards the title of “most popular language for data science” for years now, and the language to beat has been R. R has been extraordinarily popular for data-heavy programming for some time as an open source implementation of S, a language specifically designed for statistical analysis. And while R still maintains popularity (in the TIOBE index, it moved from the 16th place ranking January 2017 to the 8th place ranking in 2018), Python’s use in data science and data mining projects has been steadily increasing. Last year, respondents to DZone’s Big Data survey revealed that Python had overcome R as the predominant language used for data science, though its lead over R was statistically insignificant, and therefore didn’t quite make it to “champion status.” This was mentioned in last year’s research findings as being consistent with trends in other available research on Python and R’s use in data science: R is still popular for data/statistical analysis, but Python has been catching up.

This year, DZone’s Big Data survey showed a significant difference between the use of R and Python for data science projects: R usage decreased by 10%, from 60% to 50%, among survey respondents in the last year, while Python increased 6%, from 64% to 70%. This means 20% more respondents this year use Python for data science than respondents who use R. While Python was not created specifically for data analysis, its dynamic typing, easy-to-learn syntax, and ever-increasing base of libraries has made it an ideal candidate for developers to start delving into data science and analysis more comfortably than they may have been able to in the past.

Conclusion

How do these findings hold up against the wider development community? If we consult Stack Overflow's list of 'Most Loved, Dreaded, and Wanted Languages,' for 2018, these trends do, in fact, seem to hold. In this report, 68% (the third largest percentage) reported Python as the most loved language, and 25% (the largest percentage) reported Python as the most wanted language. Conversely, only 49% reported R as a 'most wanted' language, and 50% reported R as one of the 'most dreaded languages.' Thus, it seems that both in the big data and larger developer communities Python is on the rise and R is stagnant if not falling in popularity. 

What are your thoughts on these two powerful and popular languages? 

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,dzone research ,python ,r ,data science

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}