Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Mar. 28 to Apr. 03). Here they are, in order of popularity:
Today Apache Lucene and Solr PMC announced another version of Apache Lucene library and Apache Solr search server numbred 4.7.1.
Both Hive and Pig require approximately the same amount of lines to set up the log parsing, mostly because it involves setting up each field label and data type individually and then a regex to parse the fields out of the input files. If you have a deserializer UDF this is made much easier in either case.
ASF is the home for the majority of open source big data projects and ApacheCon is a must-attend event if you care about big data. Being able to converse with many members of various Apache project communities is invaluable.
The author wanted to know a lot more about exactly how Lucene is storing data on disk. They know the general stuff about segments and files, etc. But the author wanted to see the actual bits & bytes. So they started tracing into Lucene, trying to figure out what it is doing.
This installment of Arthur Charpentier's regular collection of data science-related links includes problems with Google's data-based flu tracker, "Simplifying Data Analysis & Making Sense of Big Data," and More.