{{announcement.body}}
{{announcement.title}}

Converting CSV Files to Apache Hive Tables With Apache ORC Files

DZone 's Guide to

Converting CSV Files to Apache Hive Tables With Apache ORC Files

Once you have an Apache Avro file, it's easy to convert it to Apache ORC and then store it in HDFS. Here's how to do it.

· Big Data Zone ·
Free Resource

I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

I read CSV files from a directory of files. Then, I can convert the CSV to AVRO directly with ConvertRecord.

I will need a schema, so I use the below settings for InferAvroSchema. If every file is different, you will need to do this every time.

CSV reader:

I use the Jackson CSV parser, which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache Avro file, it's easy to convert to Apache ORC and then store in HDFS.

Topics:
apache hive ,apache orc ,csv ,big data ,tutorial ,file conversion

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}