Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Hive Incremental Update Using Sqoop [Snippet]

DZone's Guide to

Hive Incremental Update Using Sqoop [Snippet]

Read on to learn how to import, update, and append data records using Apache Hive the Sqoop command-line interface application.

Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Image title

We can use the Sqoop incremental import command with the “-merge-key” option for updating the records in an already imported Hive table.

 --incremental lastmodified will import the updated and new records from RDBMS (MySQL) database based on last latest value of the emp_timestamp in Hive.

 --merge-key employee_id will "flatten" two datasets into one, taking the newest available records for each primary key (employee_id).

Let’s assume there are 500 thousand records in a given Hive table. In an incremental load, we got 100 thousand new employee records and 50 thousand records are updated employee records.

The above Sqoop commands will import 150 thousand records and using the Merge tool and it will append new records (100 thousand) and update the 50 thousand records based on the primary key (employee_id).

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
apache sqoop ,apache hive ,sqoop ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}