Migration of Hive Metastore To Azure
This article provides two of the best practices for Hive Metadata migration from on-premise to Azure HDInsight.
Join the DZone community and get the full member experience.Join For Free
While moving the Hadoop workload from an on-premise CDH cluster to Azure, we also had a task to move the existing on-premise Hive metastore. This article provides two of the best practices for Hive Metadata migration from on-premise to Azure HDInsight.
Method 1: Hive Metastore Migration Using DB Replication
Set up database replication between the on-premises Hive metastore DB and HDInsight Hive metastore DB. The ollowing command can be used to setup the replication between the two instances:
./hive --service metatool -updateLocation hdfs://<namenode>:8020/ wasb://<container_name>@<storage_account_name>.blob.core.windows.net/
The above ‘hive metatool’ will replicate the hive metastore data from the given HDFS to the target WASB/ADLS/ABFS
Recommendation: This approach is recommended when either the source and target metadata DB are identical, or, when you are setting up or migrating existing applications.
Method 2: Hive Metastore Migration Using Scripts
- Generate the Hive DDLs from the on-premises Hive metastore for myTable as an example, using the following script in the hive_table_dd.sh file:
- Run the above shell script by using ‘metastoreDB’ as a parameter:
bash hive_table_dd.sh metastoreDB
- Edit the generated DDL into HiveTableDDL.hql and replace the HDFS URL with
- Run the updated DDL on the target Hive metastore DB being used on HDInsight cluster:
Ensure that the Hive metastore version is compatible between on-premises and Azure HDInsight Hive instance.
Recommendation: This approach is recommended when either the source and target metadata DB are not identical, or when you are trying to set up a new environment.
Validation: In order to validate that the Hive metastore has been migrated completely, run bash script in step 1 on both the metastore DBs (i.e. source and target) to print all the Hive tables and their data locations.
Compare the outputs generated from the on-premise and Azure HDI to verify that no tables are missing in the new metastore DB.
Opinions expressed by DZone contributors are their own.