Over a million developers have joined DZone.

Hive Incremental Update Using Sqoop [Snippet]

DZone 's Guide to

Hive Incremental Update Using Sqoop [Snippet]

Read on to learn how to import, update, and append data records using Apache Hive the Sqoop command-line interface application.

Free Resource

Image title

We can use the Sqoop incremental import command with the “-merge-key” option for updating the records in an already imported Hive table.

 --incremental lastmodified will import the updated and new records from RDBMS (MySQL) database based on last latest value of the emp_timestamp in Hive.

 --merge-key employee_id will "flatten" two datasets into one, taking the newest available records for each primary key (employee_id).

Let’s assume there are 500 thousand records in a given Hive table. In an incremental load, we got 100 thousand new employee records and 50 thousand records are updated employee records.

The above Sqoop commands will import 150 thousand records and using the Merge tool and it will append new records (100 thousand) and update the 50 thousand records based on the primary key (employee_id).

apache sqoop ,apache hive ,sqoop ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}