Over a million developers have joined DZone.

The Best of the Week (Jan. 10): Big Data Zone

DZone's Guide to

The Best of the Week (Jan. 10): Big Data Zone

· Big Data Zone ·
Free Resource

Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Jan. 10 to Jan. 16). Here they are, in order of popularity:

1. Getting Started with ElasticSearch

ElasticSearch is an open-source, distributed, and very scalable search engine built on top of Lucene. After some struggle setting up ElasticSearch, the author has assembled this tutorial to help save some time for developers interested in getting started and trying it out.

2. Hadoop: The NSA-Fueled Privacy Invasion Machine

Hadoop users, or anybody interested in Big Data, may be interested in this recent article from Salon about the nefarious uses of Hadoop. A significant portion of the article is focused on explaining Hadoop, but then there's more: Hadoop as the central tool of Big Brother.

3. Splitting Large XML Files in Java

Last week the author was asked to write something in Java that is able to split a single 30GB XML file into smaller parts of configurable file size. The consumer of the file is a middle-ware application that has problems with the large size of the XML. In this article, you'll learn how to split large XML files in Java.

4. Sharding, Scaling, Data Storage Methodologies, and More: Insights on Big Data

In this article, the author provides a variety of insights on Big Data, including explanations and comparisons of OLTP and OLAP, data sharding, MPP, vertical and horizontal scaling, CAP Theorem, databases such as Greenplum and Hbase, and a detailed table comparing data storage methodologies.

5. Alternative to Difficult Stored Procedures in Big Data Computation

In the past, data structures and business logic were simple enough that one SQL statement could achieve the user's goals. However, with the rapid growth of the information industry, users often find that they need to achieve increasingly complex goals. This is why the stored procedure was introduced.


 Cloudera Enterprise Data Hub. One platform, many applications. Start today.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}