Over a million developers have joined DZone.

The Best of the Week (Nov. 1): Big Data Zone

DZone's Guide to

The Best of the Week (Nov. 1): Big Data Zone

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 1 to Nov. 7). Here they are, in order of popularity:

1. Hunk: A New Data Analytics Tool for Hadoop

Hadoop users might be interested in Hunk, an analytics tool recently released by Splunk that allows users to analyze and visualize data in Hadoop. Big data isn't worth much, after all, unless some sense can be made of it.

2. Apache Mesos: The Datacenter is the Computer

The data center is the computer. The pendulum is swinging. Traditional cloud and virtualization level resource management in the data center aren't good enough to manage the growing demands for computing services. The answer for this challenge are solutions such as Apache Mesos and YARN.

3. 4 Methods for Structured Big Data Computation

This article is an overall analysis of four methods to process structured big data. Every method has its unique advantages, and which one people choose will be determined by their project features.

4. Topic Modeling in Python and R: The Enron Email Corpus, Part 2

[Be sure to read part 1 first!]

After posting his analysis of the Enron email corpus, the author realized that the regex patterns he had set up to capture and filter out the cautionary/privacy messages at the bottoms of peoples emails were not working. Let’s have a look at his revised Python code for processing the corpus, and some new results.

5. How Can a Mac Mini Outperform a 1,636-Node Hadoop Cluster?

This recent article describes a hands-on performance comparison between GraphChi and a 1,636-node Hadoop cluster. The task set for both was to process a Twitter graph with 1.5 billion edges, and the result, surprisingly enough, was a significantly quicker processing time for GraphChi. How does that work?

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}