Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

DZone's Guide to

This Week in Hadoop and More: Apache Calcite, Kylin, NiFi, and Trafodion

From Apache NiFi to Kylin to Apache Calcite and everything in between, take a look at the latest news and trends in the world of Hadoop.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Kylin has a new release with Spark Cube Build Engine, HA distributed jobs, Calcite, and Avatica updates.

Trafodion has a new release with hashing, Top N Sort, and RegEx support.

Apache Calcite — it's everywhere you want to be. It includes Hive, Kylin, Flink, Drill, Storm, and more.

Image title

The fast Hadoop file format CarbonData is a full Apache project and provides another option to add to Apache ORC and Parquet. It works with Spark and seems interesting.

Hail for Python and Spark for analyzing genetic data is gaining some traction.

You can use Apache NiFi to monitor Apache NiFi disk usage and send emails, as well-documented here.

There is updated Hadoop and Spark Data Science Documentation for the open-source platform HDP 2.6.

Cool Apache NiFi Project of the Week

Why didn't I think of that? Cool hardware you can buy on Amazon and plug in and start thinking right into Hadoop. If loading data into Hadoop is too hard, perhaps you are thinking too much.

Creating Hive Tables From ORC

Hive table generation happens auto-magically in NiFi; just send this to PutHiveQL:

 ${hive.ddl} LOCATION '${absolute.hdfs.path}' 

Working With Apache ORC Files

ORC File Dump:

hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] [--backup-path ]  


ORC Tools (GitHub): git clone https://github.com/apache/orc.git 

On OSX, you will need JDK 8, Maven, CMake, and standard OSX build tools. This will build both C++ and Java tools for ORC files.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
calcite ,big data ,spark ,storm ,hive ,hadoop

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}