Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Jan. 10 to Jan. 16). Here they are, in order of popularity:
ElasticSearch is an open-source, distributed, and very scalable search engine built on top of Lucene. After some struggle setting up ElasticSearch, the author has assembled this tutorial to help save some time for developers interested in getting started and trying it out.
Hadoop users, or anybody interested in Big Data, may be interested in this recent article from Salon about the nefarious uses of Hadoop. A significant portion of the article is focused on explaining Hadoop, but then there's more: Hadoop as the central tool of Big Brother.
Last week the author was asked to write something in Java that is able to split a single 30GB XML file into smaller parts of configurable file size. The consumer of the file is a middle-ware application that has problems with the large size of the XML. In this article, you'll learn how to split large XML files in Java.
In this article, the author provides a variety of insights on Big Data, including explanations and comparisons of OLTP and OLAP, data sharding, MPP, vertical and horizontal scaling, CAP Theorem, databases such as Greenplum and Hbase, and a detailed table comparing data storage methodologies.
In the past, data structures and business logic were simple enough that one SQL statement could achieve the user's goals. However, with the rapid growth of the information industry, users often find that they need to achieve increasingly complex goals. This is why the stored procedure was introduced.