This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

DZone 's Guide to

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

From Apache NiFi to Kylin to Apache Calcite and everything in between, take a look at the latest news and trends in the world of Hadoop.

· Big Data Zone ·
Free Resource

Kylin has a new release with Spark Cube Build Engine, HA distributed jobs, Calcite, and Avatica updates.

Trafodion has a new release with hashing, Top N Sort, and RegEx support.

Apache Calcite — it's everywhere you want to be. It includes Hive, Kylin, Flink, Drill, Storm, and more.

Image title

The fast Hadoop file format CarbonData is a full Apache project and provides another option to add to Apache ORC and Parquet. It works with Spark and seems interesting.

Hail for Python and Spark for analyzing genetic data is gaining some traction.

You can use Apache NiFi to monitor Apache NiFi disk usage and send emails, as well-documented here.

There is updated Hadoop and Spark Data Science Documentation for the open-source platform HDP 2.6.

Cool Apache NiFi Project of the Week

Why didn't I think of that? Cool hardware you can buy on Amazon and plug in and start thinking right into Hadoop. If loading data into Hadoop is too hard, perhaps you are thinking too much.

Creating Hive Tables From ORC

Hive table generation happens auto-magically in NiFi; just send this to PutHiveQL:

 ${hive.ddl} LOCATION '${absolute.hdfs.path}' 

Working With Apache ORC Files

ORC File Dump:

hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] [--backup-path ]  

ORC Tools (GitHub): git clone https://github.com/apache/orc.git 

On OSX, you will need JDK 8, Maven, CMake, and standard OSX build tools. This will build both C++ and Java tools for ORC files.

big data ,calcite ,hadoop ,hive ,spark ,storm

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}