Pivotal Knowledge Base

Follow

How to migrate Hive from one Hadoop cluster to another

Environment

Pivotal HD/HDP: All supported versions

Purpose 

This article shows how to migrate Hive database from one cluster to another. Hive stores its metadata in its Metastore database. Metastore can be Oracle, MySQL, Derby, or PostgreSQL. Here, the migration of Hive with the Metastore database in MySQL is explained.

Procedure

Following are the high-level steps involved in a Hive migration

1. Install Hive on the new cluster and make sure both the source and destination clusters are identical

2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster

        hadoop distcp <src> <dst> 

3. Take a backup of the Hive Metastore

        mysqldump hive > /tmp/mydir/backup_hive.sql

 4. Install MySQL on the new Hadoop cluster

 5. Open Hive MySQL-Metastore dump file and replace the source Namenode hostname with the destination hostname

        hdfs://ip-address-old-namenode:port  ---> hdfs://ip-address-new-namenode:port 

 6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster

        mysql hive < /tmp/mydir/backup_hive.sql

 7. Configure Hive as normal and perform the Hive schema upgrade if needed

Impact 

Hive metadata contains the information about the database objects, and the contents are stored in the HDFS. Metadata contains HDFS URI and other details. So if we migrate Hive from one cluster to another cluster, we have to point the metadata to the HDFS of the new cluster. If we don't do this, it will point
to the HDFS of the older cluster and the migration will fail. In case of any failure, initialize Hive Metastore of the destination cluster and resume the migration following the correct steps

/bin/schematool -initSchema -dbType mysql

Frequently Faced Problem

During data validation, the counts might not match between source and target databases. This often occurs due to the absence of latest statistics. Please use  analyze table <table_name> compute statistics on both source and target tables before comparing the counts.

Additional Information

Please refer to the links below to know more about the Hive tools that can address other issues with migration

 

 

Comments

Powered by Zendesk