Pivotal Knowledge Base

Follow

Dropping default location database in hive deletes everything

Environment

PHD x.x.x

Summary

When ever you create a database explicitly to the default location, dropping the database would delete all the databases and data associated with the tables.

Problem

Show databases in hive:

hive> show databases;
OK
default
sample1
Time taken: 0.024 seconds, Fetched: 2 row(s)

Create a database explicitly with the default location:

hive> create database sample location '/hive/gphd/warehouse';
OK
Time taken: 0.059 seconds

 Check for the location of the database in metastore UI(Postgres or MYSQL): You see that there is no sample.db directory is not created, as this created under default directory.

metastore=# select * from "DBS";
DB_ID | DESC | DB_LOCATION_URI | NAME
-------+-----------------------+--------------------------------------------+---------
1 | Default Hive database | hdfs://hulk/hive/gphd/warehouse | default
22 | | hdfs://hulk/hive/gphd/warehouse | sample
23 | | hdfs://hulk/hive/gphd/warehouse/sample1.db | sample1
(3 rows)

 Check the new database created and the tables underneath:

[root@hdm1 ~]# hadoop fs -ls /hive/gphd/warehouse/
Found 1 items
drwxr-xr-x - hive hadoop 0 2015-02-26 13:46 /hive/gphd/warehouse/sample1.db

Now drop the database sample: cascade removes all the tables underneath the database

hive> drop database sample cascade;
OK
Time taken: 0.06 seconds

 Check the data in hdfs: It removes all the databases and it's tables from HDFS.

[root@hdm1 ~]# hadoop fs -ls /hive/gphd/warehouse/
ls: `/hive/gphd/warehouse/': No such file or directory

 Now, if you check your metastore the metastore still contains the metadata for the deleted databases(which you need to remove manually):

metastore=# select * from "DBS";
DB_ID | DESC | DB_LOCATION_URI | NAME
-------+-----------------------+--------------------------------------------+---------
1 | Default Hive database | hdfs://hulk/hive/gphd/warehouse | default
23 | | hdfs://hulk/hive/gphd/warehouse/sample1.db | sample1
(2 rows)

 Solution

In hive you cannot alter the database location nor update the existing properties. Never explicitly create a database in the default hive location. A workaround for this problem is to update the value of the database location in the postgresql metastore db as followed

[root@hdm1 ~]# psql -p 10432 -U postgres
psql (8.4.20)
Type "help" for help. postgres=# \c metastore
psql (8.4.20)
You are now connected to database "metastore".
postgres=# UPDATE "DBS" SET DB_LOCATION_URI = '/hive/gphd/warehouse/<db_name>.db' WHERE NAME = '<db_name>';

Comments

Powered by Zendesk