Pivotal Knowledge Base

Follow

In secure environment, HBase fails with error ConnectionLoss for /hbase

Environment:

PHD 1.x

Symptom:

HBase-master will fail to start in a secured cluster if it will not be able to connect to zookeeper, and will report an error stating ConnectionLoss for znode defined by zookeeper.znode.parent parameter in hbase-site.xml. If zookeeper.znode.parent is not specified, default /hbase is considered.

2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=hbase
2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/hbase
2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/
2014-04-17 08:29:02,754 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=sjc1ibdhdm3.research.ge.com:2181,sjc1ibdhdm2
.research.ge.com:2181,sjc1ibdhdm4.research.ge.com:2181 sessionTimeout=180000 watcher=master:60000
2014-04-17 08:29:02,858 INFO org.apache.zookeeper.Login: successfully logged in.
2014-04-17 08:29:02,860 INFO org.apache.zookeeper.Login: TGT refresh thread started.
2014-04-17 08:29:02,864 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT valid starting at:        Thu Apr 17 08:29:02 PDT 2014
2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT expires:                  Fri Apr 18 08:29:02 PDT 2014
2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT refresh sleeping until: Fri Apr 18 04:32:04 PDT 2014
2014-04-17 08:29:02,923 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 39659@sjc1ibdhdm3.research.ge.
com
2014-04-17 08:29:02,947 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server sjc1ibdhdm2.research.ge.com/3.39.89.37:2181. Will a
ttempt to SASL-authenticate using Login Context section 'Client'
2014-04-17 08:29:02,950 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to sjc1ibdhdm2.research.ge.com/3.39.89.37:2181, initiating session
2014-04-17 08:29:03,109 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server sjc1ibdhdm2.research.ge.com/3.39.89.37:2181, sessionid = 0x2456dd2f02d0005, negotiated timeout = 40000
2014-04-17 08:29:03,126 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2456dd2f02d0005, likely server has closed socket, closing socket connection and attempting reconnect
2014-04-17 08:29:03,241 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
2014-04-17 08:29:03,241 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1..

Error message indicating connection loss to the znode might indicate failure while connecting to zookeeper and there can be various reason for this. If you had a working non-secured cluster and are running into issues after securing it, the primary reason can be missing configuration / envrionmental issues.

In this case, zookeeper logs were reporting the below exceptions when hbase-master was started. It indicated that there are issues with the AES encryption. JDK 7 was installed on the server, however /usr/bin/java was pointing to alternatives which was pointing to JDK 6, because of which zookeeper was starting with JDK 6 resulting in kerberos authentication failures for hbase. It primarily indicates that zookeeper was not allowing connections to clients if AES-256 was used.

 2014-04-17 11:23:40,685 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x14570e7551d0000
2014-04-17 11:23:40,687 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x14570e7551d0000 with negotiated timeou
t 40000 for client /3.39.89.38:45482
2014-04-17 11:23:40,692 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@940] - Client failed to SASL authenticate: javax.security
.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS
mode with HMAC SHA1-96 is not supported/enabled)]
2014-04-17 11:23:40,692 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@946] - Closing client connection due to SASL authentication failure.
2014-04-17 11:23:40,693 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@997] - Closed socket connection for client /3.39.89.38:45482 which had sessionid 0x14570e7551d0000
2014-04-17 11:23:40,693 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1072)
        ..
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:234)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:662)
2014-04-17 11:23:40,694 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0

Solution:
Zookeeper sources it environment using /usr/lib/gphd/zookeeper/bin/zkEnv.sh script, which sets up JAVA_HOME by sourcing /etc/gphd/zookeeper/conf/java.env. So, in order to avoid any conflicts with java envrionment variable, one must set JAVA_HOME in java.env file.

[gpadmin@sjc1ibdhdm3 conf]$ cat java.env
export JVMFLAGS="-Djava.security.auth.login.config=/etc/gphd/zookeeper/conf/jaas.conf"
export JAVA_HOME="/usr/java/default" 

Miscellaneous:

Pivotal internal employee reference: HD-9882

Comments

Powered by Zendesk