This week, a collection of useful articles and tutorials on using tools for ETL, ELT, ingest, manipulation and preparing data for your data lake.
Start developing with Hadoop ASAP with a quick start Docker container from Cloudera.
Some great tips on using Flume.
Working with Cell-level Security in HBase.
Getting started with Apache Zeppelin (Spark).
Avoiding the Mess in Hadoop Clusters.
Real-time Stock Prediction with an Open Source Data Stack.
A deep-dive presentation on Spark SQL.
Information on the upcoming Apache Spark 1.6.
Learn more about Project Tungsten (Apache Spark).
Integration MongoDB with Apache Spark.
An Overview of DataFrames with Apache Spark and Scala.
Tips and Tricks for Scaling Apache Spark.
Introduction to Apache Spark with Python.