Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Dec. 13 to Dec. 19). Here they are, in order of popularity:
HBase is a database that provides real-time, random read and write access to tables meant to store billions of rows and millions of columns. It is designed to run on a cluster of commodity servers and to automatically scale as more servers are added, while retaining the same performance.
Lucene's facet module recently added support for dynamic range faceting, which shows how many hits match each of a dynamic set of ranges. In this article, you'll find segment tree alternatives to the O(N) linear search generally used to find range matches.
This article looks at the recent mud-slinging (if you can call it that) going on between Hortonworks and Cloudera. It's got to be good news for Hadoop, at least, and it highlights the widespread influence of the open-source Big Data framework.
This installment of Arthur Charpentier's regular collection of data science-related links includes Bayesian statistics in Python, contagious diseases in the United States from 1888 (in R), a "24 days of R" advent calendar, and more.
This installment of Arthur Charpentier's data science-related links includes the "Programming with Big Data in R" project, an analysis of matches and mismatches in picture recognition software, and a free ebook on data mining applications with R.