Pivotal Knowledge Base

Follow

Hbase application hangs indefinitely connecting to zookeeper

Version PHD 1.0.1 and GPHD 1.2

java source code example

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class testhbase {
        public static void main(String[] args) {
                Configuration conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.quorum", "danl-gphd2,danl-gphd3,danl-gphd4");
                conf.set("hbase.client.retries.number", Integer.toString(1));
                conf.set("zookeeper.session.timeout", Integer.toString(60000));
                conf.set("zookeeper.recovery.retry", Integer.toString(1));
               HBaseAdmin hbase = null;
               try {
                      hbase = new HBaseAdmin(conf);
                } catch (Exception exp) {
                        System.out.println( "did work");
                }
        }
}

 

Code compile

1. mkdir myclasses
 
2. compile
GPHD 1.2
[gpadmin@danl-gphd ~]$ javac -classpath /usr/lib/gphd/hadoop/hadoop-core-1.0.3-gphd-1.2.0.0.jar:/usr/lib/gphd/hbase/hbase-0.92.1-gphd-1.2.0.0.jar -d myclasses/ testhbase.java 
[gpadmin@danl-gphd ~]$ jar -cvf testhbase.jar -C mylcasses/ .

 
PHD 1.0.1
[gpadmin@pivhdsne ~]$ javac -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.2-alpha-gphd-2.0.1.0.jar:/usr/lib/gphd/hbase/hbase-0.94.2-gphd-2.0.1.0.jar -d myclasses/ testhbase.java
[gpadmin@pivhdsne ~]$ jar -cvf testhbase.jar -C myclasses/ .

Execute code

GPHD 1.2
[gpadmin@danl-gphd ~]$ java -classpath /usr/lib/gphd/hadoop/hadoop-core-1.0.3-gphd-1.2.0.0.jar:/usr/lib/gphd/hbase/hbase-0.92.1-gphd-1.2.0.0.jar::/usr/lib/gphd/hadoop/hadoop-core-1.0.3-gphd-1.2.0.0.jar:/usr/lib/gphd/hbase/hbase-0.92.1-gphd-1.2.0.0.jar:/usr/lib/gphd/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/gphd/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/gphd/hadoop/lib/commons-lang-2.4.jar:/usr/lib/gphd/zookeeper/zookeeper-3.3.5-gphd-1.2.0.0.jar:/usr/lib/gphd/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/gphd/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/gphd/hadoop/lib/log4j-1.2.15.jar:/usr/lib/gphd/hadoop/lib/core-3.1.1.jar:/usr/lib/gphd/hadoop/lib/commons-io-2.1.jar:/usr/lib/gphd/hadoop/lib/commons-httpclient-3.0.1.jar:testhbase.jar testhbase
 
PHD 1.0.1
[gpadmin@pivhdsne ~]$ java -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.2-alpha-gphd-2.0.1.0.jar:/usr/lib/gphd/hbase/hbase-0.94.2-gphd-2.0.1.0.jar:/usr/lib/gphd/hadoop/hadoop-common-2.0.2-alpha-gphd-2.0.1.0.jar:/usr/lib/gphd/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/gphd/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/gphd/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/gphd/hadoop-2.0.2_alpha_gphd_2_0_1_0/lib/guava-11.0.2.jar:/usr/lib/gphd/hadoop/lib/commons-lang-2.5.jar:/usr/lib/gphd/hadoop/hadoop-auth-2.0.2-alpha-gphd-2.0.1.0.jar:/usr/lib/gphd/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/gphd/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/gphd/hadoop/lib/log4j-1.2.17.jar:/usr/lib/gphd/zookeeper/zookeeper-3.4.5-gphd-2.0.1.0.jar:/etc/gphd/hbase/conf:testhbase.jar testhbase

Symptom 

hbase application appears to infinitely connect to zookeeper and never exits 

Jul 13, 2013 10:13:05 PM org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation abort
SEVERE: Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:583)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
	at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
	at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:98)
	at testhbase.main(testhbase.java:17)

 

Reason and Behavior



  • The java application might appear hung when there is a zookeeper issue depending on the retry parameters.  Having the application exit quickly when there is a problem with the cluster can be desirable in a production environment.  Using the default params the application might hang for 20+ minutes before exiting ( (20*2) * 3 * 10 ) = 1200 seconds ( 20 minutes ). 
    • hbase.client.retries.number default 10
    • zookeeper.recovery.retry default 3
  • Reducing the parameters to a more conservative value could yield better recovery times.  Setting zookeeper.recovery.retry to 0 will still result in up to two connection attempts made to all zk servers in the quorum and cause and application failure to happen in under a minute should there be a loss of zookeeper connectivity during execution. 

 

#############################################################################################
Using these params in the java code:
                Configuration conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.quorum", "danl-gphd2, danl-ghpd3, danl-gphd4");
                conf.set("hbase.client.retries.number", Integer.toString(0));
                conf.set("zookeeper.session.timeout", Integer.toString(60000));
                conf.set("zookeeper.recovery.retry", Integer.toString(0));

[gpadmin@danl-gphd hbase_test]$ time java testhbase  2> /tmp/aaa
did work

real	0m41.101s
user	0m0.680s
sys	0m0.397s

[gpadmin@danl-gphd hbase_test]$ cat /tmp/aaa | egrep "Opening "
13/07/14 22:28:48 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:29:09 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181

This took 20 seconds to timeout because the ZK client made two attempts with a timeout value of 10 seconds. 
#############################################################################################

Using these params in the java code:
                Configuration conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.quorum", "danl-gphd2, danl-ghpd3, danl-gphd4");
                conf.set("hbase.client.retries.number", Integer.toString(0));
                conf.set("zookeeper.session.timeout", Integer.toString(60000));
                conf.set("zookeeper.recovery.retry", Integer.toString(1));

[gpadmin@danl-gphd hbase_test]$ time java testhbase  2> /tmp/aaa
did work

real	1m21.213s
user	0m0.761s
sys	0m0.273s

[gpadmin@danl-gphd hbase_test]$ cat /tmp/aaa | egrep "Opening "
13/07/14 22:30:42 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:31:02 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:31:23 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:31:43 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
#############################################################################################

Using these params in the java code:
                Configuration conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.quorum", "danl-gphd2, danl-ghpd3, danl-gphd4");
                conf.set("hbase.client.retries.number", Integer.toString(1));
                conf.set("zookeeper.session.timeout", Integer.toString(60000));
                conf.set("zookeeper.recovery.retry", Integer.toString(1));

[gpadmin@danl-gphd hbase_test]$ time java testhbase  2> /tmp/aaa
did work

real 3m23.068s
user	0m0.947s
sys	0m0.267s

[gpadmin@danl-gphd hbase_test]$ cat /tmp/aaa | egrep "Opening "
13/07/14 22:33:34 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:33:55 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:34:15 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:34:36 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:34:55 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:35:15 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:35:35 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:35:36 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:35:55 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:35:57 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:36:16 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:36:18 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:36:37 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:36:40 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:36:55 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
#############################################################################################

Using these settings might be the a little more convenient from an application perspective. Especially when you want to have the application exit quickly. 
                conf.set("hbase.zookeeper.quorum", "danl-gphd2, danl-ghpd3, danl-gphd4");
                conf.set("hbase.client.retries.number", Integer.toString(1));
                conf.set("zookeeper.session.timeout", Integer.toString(60000));
                conf.set("zookeeper.recovery.retry", Integer.toString(0));

[gpadmin@danl-gphd hbase_test]$ time java testhbase 2> /tmp/aaa
did work

real 1m42.746s
user	0m0.707s
sys	0m0.446s

[gpadmin@danl-gphd hbase_test]$ cat /tmp/aaa | egrep "Opening "
13/07/14 22:13:21 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:13:42 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:14:02 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:14:22 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd4/10.10.10.12:2181
13/07/14 22:14:23 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:14:42 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd2/10.10.10.10:2181
13/07/14 22:14:44 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
13/07/14 22:15:02 INFO zookeeper.ClientCnxn: Opening socket connection to server danl-gphd3/10.10.10.11:2181
#############################################################################################

Comments

Powered by Zendesk