Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Converting CSV Files to Apache Hive Tables With Apache ORC Files

DZone's Guide to

Converting CSV Files to Apache Hive Tables With Apache ORC Files

Once you have an Apache Avro file, it's easy to convert it to Apache ORC and then store it in HDFS. Here's how to do it.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

I read CSV files from a directory of files. Then, I can convert the CSV to AVRO directly with ConvertRecord.

I will need a schema, so I use the below settings for InferAvroSchema. If every file is different, you will need to do this every time.

CSV reader:

I use the Jackson CSV parser, which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache Avro file, it's easy to convert to Apache ORC and then store in HDFS.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
apache hive ,apache orc ,csv ,big data ,tutorial ,file conversion

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}