There are a few newer projects that are of interest including Apache CarbonData as a new Columnar Data Format for fast queries.
Apache CarbonData is yet another sub-second response column format available on Github and Incubator. Check out this summary of what CarbonData is. It is from Huawei who are doing some really interesting Big Data and Spark work.
Astro, Another SQL Interface to HBase (on Spark), also from Huawei. Another one to watch, I am hoping to do a benchmark to see how it compares to Phoenix. I like Phoenix since it can work from regular ODBC/JDBC and doesn't require Spark. Though a day without Spark is not a happy one. If I don't have some Spark and NiFi daily, I think my pipeline is broken. I would like to see Astro more generic, maybe on Apache Beam.
If you want to elastically scale a small Hadoop/Spark cluster on Amazon, then check out this awesome free tool to quickly Run Open Source Hadoop Distribution on AWS.
This week's most interesting articles span NiFi, SAP Hana, Heron, Kafka, and Hadoop.
- SAP HANA Integration with Hadoop
- ALOJA Big Data Benchmarking Platform
- Apache NiFi 1.0.0 Cluster Setup
- Twitter Heron Overview
- Microservices in Kafka Ecosystem