Pivotal Knowledge Base

Follow

Namenode HA Failover Error: "Exception Encountered While Connecting to the Server"

Environment

 Product  Version
 Pivotal HD  3.x
 Pivotal HDP   2.3 / 2.4

Symptom

When checking for Namenode HA status or when trying to failover, an error is displayed signifying a connection issue.

Error Message:

hdfs@hdm1~$ hdfs haadmin -getServiceState hdm1.gphd.local 
16/12/12 05:15:28 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/hdm1.gphd.local@KRB.KERB.LAB.COM (auth:KERBEROS) cause:java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.10.101:18094 remote=hdm1.gphd.local/10.10.10.101:8020]
16/12/12 05:15:28 WARN ipc.Client: Exception encountered while connecting to the server : java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.10.101:18094 remote=hdm1.gphd.local/10.10.10.101:8020]
16/12/12 05:15:28 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/hdm1.gphd.local@KRB.KERB.LAB.COM (auth:KERBEROS) cause:java.io.IOException: java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.10.101:18094 remote=hdm1.gphd.local/10.10.10.101:8020]
Operation failed: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.10.101:18094 remote=hdm1.gphd.local/10.10.10.101:8020]; Host Details : local host is: "hdm1.gphd.local/10.10.10.101"; destination host is: "hdm1.gphd.local":8020;

Cause

"hdfs haadmin" command is unable to connect to the Namenode process because it is not responding; this may happen if the standby is very busy applying transaction changes or if during the startup, the Standby Namenode is still synchronizing or still in SAFE MODE.

The Standby Namenode logs may show messages like this: 

2016-12-12 06:40:27,938 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. 
The reported blocks 1594649 needs additional 70046605 blocks to reach the threshold 0.9990 of total blocks 71712966.
Safe mode will be turned off automatically
2016-12-12 06:45:03,834 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 2948303 needs additional 68692951 blocks to reach the threshold 0.9990 of total blocks 71712966.
Safe mode will be turned off automatically

Resolution

  1. Check the log /var/log/hadoop/hdfs/*namenode*.out file to make sure the Namenode process is not running into any problems during startup.
  2. If not, wait for the Namenode to start up entirely or to be quiet and then try the command again.  

Comments

Powered by Zendesk