Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (March 20 - March 27). Here they are, in order of popularity:
Why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs. In this article, I will answer this question.
Today we have a few top stories for you. First, Apple announced their acquisition of FoundationDB. I am curious what this purchase is really about, but we will have to wait and see. Both Sides of the Table brings us an interview with Fred Wilson that is very informative. Lastly, Seth Godin talks about how some of the harder things are more worthwhile. It is definitely a good, short read.
When I first heard about a lie detector as a child, I was puzzled. How could a machine detect lies? If it could, why couldn’t you use it to predict the future? For example, you could say “IBM stock will go up tomorrow” and let the machine tell you whether you’re lying. I saw a presentation of a machine learning package the other day. Some of the questions implied that the audience had a magical understanding of machine learning, as if an algorithm could extract answers from data that do not contain the answer.
Over the last couple of weeks I’ve been experimenting with differentclassifiers to detect speakers in HIMYM transcripts and in all my attempts so far the only features I’ve used have been words. This led to classifiers that were overfitted to the training data so I wanted to generalise them by introducing parts of speech of the words in sentences which are more generic.
Following on from Rik van Bruggen’s blog post on a QCon graph he’s created ahead of this week’s conference, I was curious whether we could extract any interesting relationships between talks based on their abstracts. I therefore wanted to extract topics and connect each talk to the topic that describes it best.