Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Converting CSV Files to Apache Hive Tables With Apache ORC Files

DZone's Guide to

Converting CSV Files to Apache Hive Tables With Apache ORC Files

Once you have an Apache Avro file, it's easy to convert it to Apache ORC and then store it in HDFS. Here's how to do it.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

I read CSV files from a directory of files. Then, I can convert the CSV to AVRO directly with ConvertRecord.

I will need a schema, so I use the below settings for InferAvroSchema. If every file is different, you will need to do this every time.

CSV reader:

I use the Jackson CSV parser, which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache Avro file, it's easy to convert to Apache ORC and then store in HDFS.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
apache hive ,apache orc ,csv ,big data ,tutorial ,file conversion

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}