Pivotal Knowledge Base

Follow

HDFS client commands fail with "Unexpected error reading responses on connection"

Environment

Product Version
Pivotal HDB 1.x / 2.0
Pivotal HD 3.0.1
Isilon 8.x / 7.x

Symptom

When the Hadoop Distributed File System (HDFS) is stored on Isilon and hdfs commands are issued by nonsuperuser accounts, errors like these may be seen:

[gpadmin@HAWQMASTER ~]$ hdfs dfs -ls /
16/08/01 21:48:39 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (2008879874) connection to isi-sc.lab.com/10.110.110.209:8020 from gpadmin,5,main]
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1125)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
ls: Failed on local exception: java.io.IOException: Error reading responses; Host Details : local host is: "hdw1.gphd.local/10.120.140.10"; destination host is: "isi-sc.lab.com":8020;
[gpadmin@HAWQMASTER ~]$

The same command issued by the root will work correctly: 

[root@HAWQMASTER ~]# hdfs dfs -ls /
Found 10 items
drwxrwxrwx - yarn hadoop 0 2016-07-01 19:34 /app-logs
drwxr-xr-x - hdfs hdfs 0 2016-06-21 03:55 /apps
drwxr-xr-x - yarn hadoop 0 2016-06-20 13:34 /ats
drwxr-xr-x - gpadmin gpadmin 0 2016-06-20 14:02 /hawq_default
drwxr-xr-x - hdfs hdfs 0 2016-06-20 13:34 /hdp
drwxr-xr-x - mapred hdfs 0 2016-06-20 13:34 /mapred
drwxrwxrwx - mapred hadoop 0 2016-06-20 13:34 /mr-history
drwxr-xr-x - gpadmin gpadmin 0 2016-06-20 15:22 /pxf_data
drwxrwxrwx - hdfs hdfs 0 2016-07-15 03:02 /tmp
drwxr-xr-x - hdfs hdfs 0 2016-07-02 06:11 /user
[root@HAWQMASTER ~]#

When analysing the TCP dumps from the client side, ACCESS_DENIED_MESSAGES are seen in the packet contents:

tcpdump -i any -w /root/gpadmin.trc "tcp port 8020"
tcpdump -XX -n -r /root/gpadmin.trc | less
<...>
15:11:37.471439 IP 10.110.110.209.isi-sc.lab.com >  hdw1.gphd.local.46219: Flags [P.], seq 1:165, ack 198, win 2058, options [nop,nop,TS val 3591022876 ecr 3339133289], length 164
        0x0000:  0000 0001 0006 000e 1ea6 2280 0000 8100  ..........".....
        0x0010:  03eb 0800 4500 00d8 8f39 4000 4006 762b  ....E....9@.@.v+
        0x0020:  0ab2 8fcf 0ab2 8f88 1f54 b48b 8d46 4378  .........T...FCx
        0x0030:  8481 dc73 8018 080a 7303 0000 0101 080a  ...s....s.......
        0x0040:  d60a a91c c707 2169 0000 00a0 9e01 08fd  ......!i........
        0x0050:  ffff ff0f 1001 1809 2213 6a61 7661 2e69  ........".java.i
        0x0060:  6f2e 494f 4578 6365 7074 696f 6e2a 6720  o.IOException*g.
        0x0070:  7374 6174 7573 3a20 5354 4154 5553 5f41  status:.STATUS_A
        0x0080:  4343 4553 535f 4445 4e49 4544 203d 2030  CCESS_DENIED.=.0
        0x0090:  7843 3030 3030 3032 3220 5061 7468 3a20  xC0000022.Path:.
        0x00a0:  2f6f 6e65 6673 5f68 6466 732f 6966 732f  /onefs_hdfs/ifs/
        0x00b0:  6461 7461 2f43 6c75 7374 6572 4275 6363  data/ClusterBucc
        0x00c0:  696e 5959 5959 5959 6f6e 652d 4443 412f  xxxxxx/Zone-DCA/
        0x00d0:  6861 646f 6f70 3004 3a10 8283 0dee 1fcd  hadoop0.:.......
        0x00e0:  4a2f 9cd4 862d c588 c0a9 4001            J/...-....@.

 Queries in HAWQ may fail with the following: 

ERROR: Append-Only Storage Read could not open segment file 'hdfs://isi-sc.lab.com:8020/hawq_data/gpseg16/16385/16596/635983.1' for relation 'loaded_data' (seg16 slice1 hdw1.gphd.local:40000 pid=792583) 
(Detail HdfsRpcException: RPC channel to "isi-sc.lab.com:8020" got protocol mismatch: RPC channel cannot find pending call: id = -3.;Line 1574;Routine cdbdisp_finishCommand;). [nQSError: 16015] SQL statement execution failed. (HY000)

Cause

The Hadoop client receives a nonstandard reply from the Isilon HDFS which causes theNullPointerException  (NPE) error seen by the HDFS client. 

The nonstandard message indicates that access is denied to HDFS for the given user, as per the TCPDUMP output. This is caused by file system permissions issues and ACL issues on the Isilon side. In Isilon, Hadoop requires at least read permissions from the root directory all the way up to the Isilon directory where HDFS files are located. 

Resolution

Permissions and ACL should be reviewed on the Isilon side, Isilon support may need to be contacted to help with this. In the most recent case, this was resolved by making sure that all Hadoop users had at least read permissions from the root directory all the way up to the directory containing HDFS data on the Isilon. This can be achieved by changing permissions of applying an ACL, an example is shown below: 

chmod +a group 507 allow dir_gen_read,dir_gen_execute . 

 

 

Comments

Powered by Zendesk