Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Best of the Week (Jan. 31): Big Data Zone

DZone's Guide to

The Best of the Week (Jan. 31): Big Data Zone

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Jan. 31 to Feb. 6). Here they are, in order of popularity:

1. How to Write the Perfect Stack Overflow Question (After Analyzing 10,000)

This article is the result of researching over 10,000 Stack Overflow questions. It summarizes how you need to phrase and write your question in order to get better and faster answers.

2. Introduction To Hive's Partitioning

In Hive’s implementation of partitioning, data within a table is split across multiple partitions. Each partition corresponds to a particular value(s) of partition column(s) and is stored as a sub-directory within the table’s directory on HDFS. Today we are going to see how we can load a CSV file to a partitioned table.

3. Big Data Search, Part 6: Sorting Randomness

As it turns out, doing work on big data sets is quite hard. To start with, you need to get the data, and it is… well, big. So that takes a while. Instead, the author decided to test my theory on the following scenario. Given 4 GB of random numbers, let us find how many times we have the number 1.

4. How to Install Pig & Hive on Linux Mint VM

In this article, you'll find step-by-step instructions on how to install Apache Pig, a data analysis platform, and Apache Hive, a data warehouse built on top of Hadoop, on Linux Mint VM

5. Time Series Data in R

There is no short­age of time series data avail­able on the web for use in stu­dent projects, or self-​​learning, or to test out new fore­cast­ing algo­rithms. It is now rel­a­tively easy to access these data sets directly in R. In this article, you'll find a variety of resources for time series data in R.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}