Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Mar. 7 to Mar. 13). Here they are, in order of popularity:
In this post we are going to write a MapReduce program to consume Avro input data and also produce data in Avro format. We will write a program to calculate average of student marks.
There is a lot of talk recently about the possible negative consequences of Big Data, machine learning, and the lack of data privacy. Facebook, for example, has a lot of data regarding a lot of people, and Mark Zuckerberg attended the annual NIPS conference and panels on deep learning, so what does it all mean?
You may remember the author's first blog post describing how the Lucene developers eat our own dog food by using a Lucene search application to find our Jira issues. Recently, he's made some further progress, so he wanted to give an update.
With Avro, the context and the values are separated. This means the schema/structure of what the information is does not get stored or streamed over and over and over and over (and over) again.
There’s a lot of hype about “big data” and a general trend to try to apply Hadoop to almost every problem. However, sometimes it turns out that you can get much better results by writing an old-fashioned, but optimised, single-node version of your algorithm.