Make sure you didn't miss anything with this list of the Best of the
Week in the Big Data Zone (Oct. 18 to Oct. 24). Here they are, in order
Recently, I wanted to migrate an app to SQLite as a data backend. In fact, I wanted it to work with PostgreSQL and SQLite indistinctly (but not at the same time) and switch between the two databases easily without changing any code. In this tutorial, see how I overcame various challenges along the way.
Hadoop has been advancing at an amazing pace, with new features and additional capabilities appearing almost on a daily basis. Some changes are small, some are still progressing, and some are cool, but in the author's opinion, the most important change is the introduction of YARN in Hadoop 2.0.
Anybody interested in machine learning - particularly more accessible ways to implement it - should take a look at PredictionIO, an open-source machine learning server built on scalable frameworks (Hadoop and Cascading, for example) and intended to help developers create predictive features in their software.
On Monday, Hortonworks' HBase team released HBase 96. According to the team, over 2,000 issues were corrected in this update, and a lot of big improvements have been made.
So, you want to harvest a massive amount of data from the internet? What better storage mechanism than Cassandra? This is easy to do with Nutch.