Over a million developers have joined DZone.

The Best of the Week (Nov. 22): Big Data Zone

DZone's Guide to

The Best of the Week (Nov. 22): Big Data Zone

· Big Data Zone ·
Free Resource

Learn how to operationalize machine learning and data science projects to monetize your AI initiatives. Download the Gartner report now.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 22 to Nov. 28). Here they are, in order of popularity:

1. How Python Became the Language of Choice for Data Science

Nowadays, Python is probably the programming language of choice (besides R) for data scientists for prototyping, visualization, and running data analyses on small and medium sized data sets. And rightly so, I think, given the large number of available tools. However, it wasn’t always like this.

2. Integrating R with Cloudera Impala for Real-Time Queries on Hadoop

Impala uses Hadoop as a storage engine, but moves away from MapReduce algorithms toward distributed queries. Also, R can be integrated with Impala to provide fast, interactive queries running on top of Hadoop data sets. The data can then be further processed or visualized within R.

3. An Introduction to Machine Learning With R

This set of slides presents an introduction to machine learning with R. It covers the strong points of R as a language, the basic concepts and uses of machine learning, and provides an overview of each, complete with code samples in R and images of the visualized data.

4. Data News: What Every Programmer Should Know About Memory, and More

This installment of Arthur Charpentier's regular collection of data science-related links includes a free e-book on "Applied Epidemiology Using R," an argument that statistics are the least important part of data science, and what every programmer should know about memory.

5. Hadoop, MapReduce and Hive: How to Use Non-Java Languages, Such as R

This recent tutorial demonstrates how to use non-Java languages - R, in particular - to work with Hadoop data through MapReduce and Hive. Though the tutorial focuses on R, it is also meant to open doors for users working with other languages, such as Python, Ruby, and Linux commands or Shell scripts.


Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Our Chief Data Scientist discusses the source of most headlines about AI failures here.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}