Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Best of the Week (Dec 5): Big Data Zone

DZone's Guide to

The Best of the Week (Dec 5): Big Data Zone

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (December 05 - December 12). Here they are, in order of popularity:

1. How well does sample range estimate range?

  • I’ve been doing some work with Focused Objective lately, and today the following question came up in our discussion. If you’re sampling from a uniform distribution, how many samples do you need before your sample range has an even chance of covering 90% of the population range?

2. After a coin comes up heads 10 times

  • Suppose you’ve seen a coin come up heads 10 times in a row. What do you believe is likely to happen next? The answer has to do with the concept of Levels of Uncertainty.

3. Data Science workshop at data2day

  • Giving a one day tutorial on data science is something I’ve been considering in different contexts from time to time, but for different reasons it never really happened. Finally, last Friday, the tutorial took place as a workshop in the data2day conference, and I think it went pretty well. In this post I’d like to talk a bit about our approach and our experiences.

4. Getting Started with Machine Learning

  • 'Machine learning' is a mystical term. Most developers don’t need it at all in their daily work, and the only details about it we know are from some university course 5 years ago

5. Spark: Write to CSV file with header using saveAsFile

  • In my last blog post I showed how to write to a single CSV file using Spark and Hadoop and the next thing I wanted to do was add a header row to the resulting row. Hadoop’s FileUtil#copyMerge function does take a String parameter but it adds this text to the end of each partition file which isn’t quite what we want. However, if we copy that function into our own FileUtil class we can restructure it to do what we want:

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}