Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Best of the Week (Jan. 17): Big Data Zone

DZone's Guide to

The Best of the Week (Jan. 17): Big Data Zone

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Jan. 17 to Jan. 23). Here they are, in order of popularity:

1. Apache Spark: The Next Big Data Thing?

Apache Spark is generating some buzz right now. Databricks, the company founded to support Spark raised $14M from Andreessen Horowitz, Cloudera has decided to fully support Spark, and others say it’s the next big thing. So, the author thought it’s time he get an understanding of what the buzz is about.

2. Big Data Search, Part 1

The author got tired of the old questions that they were asking candidates, so he decided to add a new one. Let us imagine a pretty trivial CSV file. However, let's assume that it's a small example of a CSV file that is 15 TB in size. The requirement is to be able to query on that file.

3. Big Data Search, Part 3: Binary Search of Textual Data

The index the author created for his previous exercise is just a text file, sorted by the indexed key. When doing a search by a human, that makes it easy to work with. Much easier than trying to work with a binary file, and it also helps debugging. However, it does make it running a binary search on the data a bit harder.

4. How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1

After spending some time playing around on a Single-Node pseudo-distributed cluster, it's time to get into real world Hadoop. It's important to note that there are multiple ways to achieve this, and the author is going to cover how to set up a multi-node Hadoop cluster on Amazon EC2.

5. Data News: Data Mining Reveals Big Problems for MOOCs, and More

This installment of Arthur Charpentier's regular collection of data science-related links includes R as a second language, the problems of MOOCs exposed by data mining, and the reality of the computer code you see in movies.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}