Over a million developers have joined DZone.

The Best of the Week (Jan. 10): Big Data Zone

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Jan. 10 to Jan. 16). Here they are, in order of popularity:

1. Getting Started with ElasticSearch

ElasticSearch is an open-source, distributed, and very scalable search engine built on top of Lucene. After some struggle setting up ElasticSearch, the author has assembled this tutorial to help save some time for developers interested in getting started and trying it out.

2. Hadoop: The NSA-Fueled Privacy Invasion Machine

Hadoop users, or anybody interested in Big Data, may be interested in this recent article from Salon about the nefarious uses of Hadoop. A significant portion of the article is focused on explaining Hadoop, but then there's more: Hadoop as the central tool of Big Brother.

3. Splitting Large XML Files in Java

Last week the author was asked to write something in Java that is able to split a single 30GB XML file into smaller parts of configurable file size. The consumer of the file is a middle-ware application that has problems with the large size of the XML. In this article, you'll learn how to split large XML files in Java.

4. Sharding, Scaling, Data Storage Methodologies, and More: Insights on Big Data

In this article, the author provides a variety of insights on Big Data, including explanations and comparisons of OLTP and OLAP, data sharding, MPP, vertical and horizontal scaling, CAP Theorem, databases such as Greenplum and Hbase, and a detailed table comparing data storage methodologies.

5. Alternative to Difficult Stored Procedures in Big Data Computation

In the past, data structures and business logic were simple enough that one SQL statement could achieve the user's goals. However, with the rapid growth of the information industry, users often find that they need to achieve increasingly complex goals. This is why the stored procedure was introduced.


Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}