Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

DZone's Guide to

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

From Apache NiFi to Kylin to Apache Calcite and everything in between, take a look at the latest news and trends in the world of Hadoop.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Kylin has a new release with Spark Cube Build Engine, HA distributed jobs, Calcite, and Avatica updates.

Trafodion has a new release with hashing, Top N Sort, and RegEx support.

Apache Calcite — it's everywhere you want to be. It includes Hive, Kylin, Flink, Drill, Storm, and more.

Image title

The fast Hadoop file format CarbonData is a full Apache project and provides another option to add to Apache ORC and Parquet. It works with Spark and seems interesting.

Hail for Python and Spark for analyzing genetic data is gaining some traction.

You can use Apache NiFi to monitor Apache NiFi disk usage and send emails, as well-documented here.

There is updated Hadoop and Spark Data Science Documentation for the open-source platform HDP 2.6.

Cool Apache NiFi Project of the Week

Why didn't I think of that? Cool hardware you can buy on Amazon and plug in and start thinking right into Hadoop. If loading data into Hadoop is too hard, perhaps you are thinking too much.

Creating Hive Tables From ORC

Hive table generation happens auto-magically in NiFi; just send this to PutHiveQL:

 ${hive.ddl} LOCATION '${absolute.hdfs.path}' 

Working With Apache ORC Files

ORC File Dump:

hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] [--backup-path ]  


ORC Tools (GitHub): git clone https://github.com/apache/orc.git 

On OSX, you will need JDK 8, Maven, CMake, and standard OSX build tools. This will build both C++ and Java tools for ORC files.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
calcite ,big data ,spark ,storm ,hive ,hadoop

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}