Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Converting CSV Files to Apache Hive Tables With Apache ORC Files

DZone's Guide to

Converting CSV Files to Apache Hive Tables With Apache ORC Files

Once you have an Apache Avro file, it's easy to convert it to Apache ORC and then store it in HDFS. Here's how to do it.

· Big Data Zone ·
Free Resource

Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.

I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

I read CSV files from a directory of files. Then, I can convert the CSV to AVRO directly with ConvertRecord.

I will need a schema, so I use the below settings for InferAvroSchema. If every file is different, you will need to do this every time.

CSV reader:

I use the Jackson CSV parser, which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache Avro file, it's easy to convert to Apache ORC and then store in HDFS.

 Cloudera Enterprise Data Hub. One platform, many applications. Start today.

Topics:
apache hive ,apache orc ,csv ,big data ,tutorial ,file conversion

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}