Hive Incremental Update Using Sqoop [Snippet]
Read on to learn how to import, update, and append data records using Apache Hive the Sqoop command-line interface application.
Join the DZone community and get the full member experience.Join For Free
We can use the Sqoop incremental import command with the “-merge-key” option for updating the records in an already imported Hive table.
--incremental lastmodified will import the updated and new records from RDBMS (MySQL) database based on last latest value of the
emp_timestamp in Hive.
--merge-key employee_id will "flatten" two datasets into one, taking the newest available records for each primary key (
Let’s assume there are 500 thousand records in a given Hive table. In an incremental load, we got 100 thousand new employee records and 50 thousand records are updated employee records.
The above Sqoop commands will import 150 thousand records and using the Merge tool and it will append new records (100 thousand) and update the 50 thousand records based on the primary key (
Opinions expressed by DZone contributors are their own.