Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (December 05 - December 12). Here they are, in order of popularity:
I’ve been doing some work with Focused Objective lately, and today the following question came up in our discussion. If you’re sampling from a uniform distribution, how many samples do you need before your sample range has an even chance of covering 90% of the population range?
Suppose you’ve seen a coin come up heads 10 times in a row. What do you believe is likely to happen next? The answer has to do with the concept of Levels of Uncertainty.
Giving a one day tutorial on data science is something I’ve been considering in different contexts from time to time, but for different reasons it never really happened. Finally, last Friday, the tutorial took place as a workshop in the data2day conference, and I think it went pretty well. In this post I’d like to talk a bit about our approach and our experiences.
'Machine learning' is a mystical term. Most developers don’t need it at all in their daily work, and the only details about it we know are from some university course 5 years ago
In my last blog post I showed how to write to a single CSV file using Spark and Hadoop and the next thing I wanted to do was add a header row to the resulting row. Hadoop’s FileUtil#copyMerge function does take a String parameter but it adds this text to the end of each partition file which isn’t quite what we want. However, if we copy that function into our own FileUtil class we can restructure it to do what we want: