Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Best of the Week (Mar 20): Big Data Zone

DZone's Guide to

The Best of the Week (Mar 20): Big Data Zone

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (March 20 - March 27). Here they are, in order of popularity:

1. Why Isn't Everything Normally Distributed?

  • Why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs. In this article, I will answer this question.

2. Geek Reading March 25, 2015

  • Today we have a few top stories for you. First, Apple announced their acquisition of FoundationDB. I am curious what this purchase is really about, but we will have to wait and see. Both Sides of the Table brings us an interview with Fred Wilson that is very informative. Lastly, Seth Godin talks about how some of the harder things are more worthwhile. It is definitely a good, short read.

3. Machine Learning and Magic

  • When I first heard about a lie detector as a child, I was puzzled. How could a machine detect lies? If it could, why couldn’t you use it to predict the future? For example, you could say “IBM stock will go up tomorrow” and let the machine tell you whether you’re lying. I saw a presentation of a machine learning package the other day. Some of the questions implied that the audience had a magical understanding of machine learning, as if an algorithm could extract answers from data that do not contain the answer.

4. Python: Detecting the Speaker in HIMYM Using Parts of Speech (POS) Tagging

  • Over the last couple of weeks I’ve been experimenting with differentclassifiers to detect speakers in HIMYM transcripts and in all my attempts so far the only features I’ve used have been words. This led to classifiers that were overfitted to the training data so I wanted to generalise them by introducing parts of speech of the words in sentences which are more generic.

5. Python: scikit-learn/lda: Extracting Topics from Qcon Talk Abstracts

  • Following on from Rik van Bruggen’s blog post on a QCon graph he’s created ahead of this week’s conference, I was curious whether we could extract any interesting relationships between talks based on their abstracts. I therefore wanted to extract topics and connect each talk to the topic that describes it best.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}