Pivotal Knowledge Base

Follow

GPHDFS external table query fails with error: "SQL State: 38000"

Environment

Pivotal Greenplum: 4.3.x

OS: RHEL 6.x

Symptom

When querying a GPHDFS external table, it fails with error "SQL State: 38000" but no clear error message returned

ERROR: external table gphdfs protocol command ended with error. 17/06/26 15:49:10 INFO security.UserGroupInformation: Login successful for user hdfs_test@HADOOP.COM using keytab file hdfs_test.keytab (seg91 slice1 MPP-DN-025:40003 pid=120646) 
SQL State: 38000
Details: 17/06/26 15:49:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/26 15:49:11 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/06/26 15:49:11 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token
Command: 'gphdfs://hacluster/user/ldapuser/process-temp-data/DATAFLOW_SUB_45121251/H002/*_0DATA.greenplum'
External table DATAFLOW_SUB_45121251_0, file gphdfs://hacluster/user/ldapuser/process-temp-da
ta/DATAFLOW_SUB_45121251/*_0DATA.greenplum

Cause

There is no useful information in the logs on the master host. But when we check the logs on the segment reporting the error, it can be seen that the data files on HDFS do not match the input pattern as specified in the definition of an external table

2017-06-26 14:49:29.820907 CST,"bigdata","user1",p155761,th-945256672,"10.17.2.40","46156",2017-06-26 14:49:26 CST,127800508
,con407236,cmd7,seg28,slice1,dx261750,x127800508,sx1,"LOG","00000","read err msg from pipe, len:1456 msg:17/06/26 14:49:28 IN
FO security.UserGroupInformation: Login successful for user hdfs_test@HADOOP.COM using keytab file hdfs_test.keytab
17/06/26 14:49:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/26 14:49:29 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/06/26 14:49:29 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 89150 for hdfs_test on ha-hdfs:hacluster
17/06/26 14:49:29 INFO security.TokenCache: Got dt for hdfs://hacluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster
Exception in thread ""main"" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://hacluster/user/ldapuser/process-temp-data/DATAFLOW_SUB_45121251/*_0DATA.greenplum matches 0 files
atorg.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at com.emc.greenplum.gpdb.hdfsconnector.HDFSReader.assignSplits(HDFSReader.java:245)
at com.emc.greenplum.gpdb.hdfsconnector.HDFSReader.doRead(HDFSReader.java:157)
at com.emc.greenplum.gpdb.hdfsconnector.HDFSReader.main(HDFSReader.java:258)
",,,,,,,0,,,,

Resolution

This is not an issue with Greenplum. The user running this query should check the data source on HDFS to see why there is no files matching the input pattern.  

Comments

Powered by Zendesk