Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

DZone's Guide to

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

From Apache NiFi to Kylin to Apache Calcite and everything in between, take a look at the latest news and trends in the world of Hadoop.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Kylin has a new release with Spark Cube Build Engine, HA distributed jobs, Calcite, and Avatica updates.

Trafodion has a new release with hashing, Top N Sort, and RegEx support.

Apache Calcite — it's everywhere you want to be. It includes Hive, Kylin, Flink, Drill, Storm, and more.

Image title

The fast Hadoop file format CarbonData is a full Apache project and provides another option to add to Apache ORC and Parquet. It works with Spark and seems interesting.

Hail for Python and Spark for analyzing genetic data is gaining some traction.

You can use Apache NiFi to monitor Apache NiFi disk usage and send emails, as well-documented here.

There is updated Hadoop and Spark Data Science Documentation for the open-source platform HDP 2.6.

Cool Apache NiFi Project of the Week

Why didn't I think of that? Cool hardware you can buy on Amazon and plug in and start thinking right into Hadoop. If loading data into Hadoop is too hard, perhaps you are thinking too much.

Creating Hive Tables From ORC

Hive table generation happens auto-magically in NiFi; just send this to PutHiveQL:

 ${hive.ddl} LOCATION '${absolute.hdfs.path}' 

Working With Apache ORC Files

ORC File Dump:

hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] [--backup-path ]  


ORC Tools (GitHub): git clone https://github.com/apache/orc.git 

On OSX, you will need JDK 8, Maven, CMake, and standard OSX build tools. This will build both C++ and Java tools for ORC files.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
calcite ,big data ,spark ,storm ,hive ,hadoop

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}