Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Feb. 28 to Mar. 6). Here they are, in order of popularity:
This week, Apache Hadoop 2.3.0 was released. There are a lot of bug fixes and small changes in this one - you can read it all in Apache's release notes - but some there are some bigger changes, such as in-memory caching for HDFS and heterogeneous storage hierarchy in HDFS.
Before anyone freaks out, the author's talking about a technology collapse, not a market collapse or steep downhill slope of a hype curve. Market demands are pushing our systems to ingest increasing amounts of data in a shorter time, while also making that data available to an increasing variety of queries.
The author has started work on the second edition of his book, which will bring existing coverage up to date, and also add new chapters covering things like YARN, Running Storm on YARN, pulling data out of Kafka into HDFS, using Spark for in-memory, iterative data processing, and more.
Python has a vast library of modules that are included with its distribution. The csv module gives the Python programmer the ability to parse CSV (Comma Separated Values) files.
The current version of Oozie (4.0.0) doesn’t build correctly when you try and target Hadoop 2.2. The Oozie team have a fix going into release 4.0.1 (see OOZIE-1551), but until then you can hack the Maven files to get it working with 4.0.0.