Pivotal Knowledge Base

Follow

Querying an external table fails with "java.io.IOException: Not a file" in Pivotal HDB

Environment

Product Version
Pivotal HAWQ/HDB 1.x / 2.0.x
OS RHEL 6.x

Symptom

An external table was set up in Pivotal HDB to read data from the Hive table staging.user_name_data:

CREATE EXTERNAL table ext_user_name_data ( 
user_id varchar(256),
gender varchar(50),
first_name varchar(100),
last_name varchar(100),
middle_name varchar(100)
)
LOCATION ('pxf://hawq20:51200/staging.user_name_data?PROFILE=Hive')
FORMAT 'custom' (formatter='pxfwritable_import');

When querying the table using Pivotal Extension Framework (PXF) in Pivotal HDB, the following error was seen:

gpadmin=# select count(*) from ext_user_name_data ;
ERROR: remote component error (500) from '192.168.177.170:51200': type Exception report message java.io.IOException: Not a file: hdfs://hawq20.lab:8020/apps/hive/warehouse/staging.db/user_name_data/base description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.io.IOException: Not a file: hdfs://hawq20.lab:8020/apps/hive/warehouse/staging.db/user_name_data/base (libchurl.c:878)
gpadmin=#

Cause 

This error is caused by the Hive data files being stored in a subdirectory under the table name. For example, the table "staging.user_name_data" above the files is in fact, stored in the subdirectory "base": 

[hdfs@hawq20 ~]$ hdfs dfs -ls /apps/hive/warehouse/staging.db/user_name_data/base
Found 4 items
-rwxrwxrwx 3 hive hdfs 1910 2016-06-03 18:13 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 18:22 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_1
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 19:50 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_2
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 19:51 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_3
[hdfs@hawq20 ~]$

Hive is likely configured to read the files correctly from the subdirectory, but Pivotal HDB is not. 

Resolution

1.In Ambari, under MAPREDUCE / Custom mapred-site / add in two custom values:

hive.mapred.supports.subdirectories=true
mapred.input.dir.recursive=true

2. Restart all services as requested by Ambari (YARN, MAPREDUCE, HIVE).

3. Rerun the query to confirm that the issue is resolved:

gpadmin=# select count(*) from ext_user_name_data;
count
-------
646
(1 row) gpadmin=# 

 

Comments

Powered by Zendesk