Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

DZone's Guide to

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

From Apache NiFi to Kylin to Apache Calcite and everything in between, take a look at the latest news and trends in the world of Hadoop.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Kylin has a new release with Spark Cube Build Engine, HA distributed jobs, Calcite, and Avatica updates.

Trafodion has a new release with hashing, Top N Sort, and RegEx support.

Apache Calcite — it's everywhere you want to be. It includes Hive, Kylin, Flink, Drill, Storm, and more.

Image title

The fast Hadoop file format CarbonData is a full Apache project and provides another option to add to Apache ORC and Parquet. It works with Spark and seems interesting.

Hail for Python and Spark for analyzing genetic data is gaining some traction.

You can use Apache NiFi to monitor Apache NiFi disk usage and send emails, as well-documented here.

There is updated Hadoop and Spark Data Science Documentation for the open-source platform HDP 2.6.

Cool Apache NiFi Project of the Week

Why didn't I think of that? Cool hardware you can buy on Amazon and plug in and start thinking right into Hadoop. If loading data into Hadoop is too hard, perhaps you are thinking too much.

Creating Hive Tables From ORC

Hive table generation happens auto-magically in NiFi; just send this to PutHiveQL:

 ${hive.ddl} LOCATION '${absolute.hdfs.path}' 

Working With Apache ORC Files

ORC File Dump:

hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] [--backup-path ]  


ORC Tools (GitHub): git clone https://github.com/apache/orc.git 

On OSX, you will need JDK 8, Maven, CMake, and standard OSX build tools. This will build both C++ and Java tools for ORC files.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
calcite ,big data ,spark ,storm ,hive ,hadoop

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}