Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Best of the Week: Big Data Zone

DZone's Guide to

The Best of the Week: Big Data Zone

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Oct. 11 to Oct. 17). Here they are, in order of popularity:

1. Big Data Analytics Beyond Hadoop

This article outlines the need to look beyond Hadoop for some big data analytics. From a batch analytics perspective, Spark is ideal for iterative machine learning algorithms. From a real-time analytics perspective, Storm is preferable. From a specialized data structures perspective, GraphLab is an ideal paradigm for processing large graphs.

2. Bayesian Modeling for the Perfect Pizza

How do you check if your pizza’s done? You look at it. But what if you’re mass producing pizza? In "Multivariate Bayesian cognitive modeling for unsupervised quality control of baked pizzas," the authors propose using computers specifically trained to look at a pizza and say, “yup, perfect! Done.”

3. The Rise of Big Data

While helping a MongoDB user with a sharding issue - his chunks weren't splitting - the author learned an important lesson about big data and tactfulness.

4. Apache Releases Hadoop 2.0: MapReduce, YARN and Big Changes for Big Data

Hadoop 2.0 is here, and with it come some big changes. The most notable, as detailed by a recent article from InfoWorld, is MapReduce 2.0, which is now incorporated into a larger system called YARN (Yet Another Resource Negotiator). Take a look and see what may be in store for Big Data. 

5. Detecting Reddit Voting Rings Using This Weird Little Data Trick

The HyperLogLog counter is a probabilistic data structure that estimates the count of unique elements in a list. What if we created a HyperLogLog counter for each Reddit user, and for every upvote, updated the corresponding HyperLogLog counter with the voter's id. Given this setup, here’s how we detect the voting ring.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}