Pivotal Knowledge Base

Follow

Namenode will fail to start after enabling kerberos if QJM related kerberos parameters are missing

Environment

Product Version
Pivotal Hadoop 1.1.x
OS RHEL 6.x

Symptom

Namenode will fail to start after kerberos is enabled if Quorum Journal Nodes (QJM) related kerberos properties are not mentioned in hdfs-site.xml

2014-01-26 00:46:35,631 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6002 ms (timeout=20000 ms) for a response for selectInputStreams
..
2014-01-26 00:46:39,783 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.165.100.25:8485, 192.165.100.26:8485, 192.165.100.27:8485]. Skipping.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:
..
192.165.100.27:8485: Call From phd11-nn.saturn.local/192.165.100.41 to phd11-snn.saturn.local:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:455)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:249)
..
..         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)

Notes

In case of HA configuration using QJM, namenode will try to establish connection to all the journal nodes, and if connection attempts fails to majority of them, namenode will shut itself down. When HA is enabled on PHD, QJM related kerberos parameters must be added to hdfs-site.xml else namenode will not be able to login to the journal nodes.

Resolution

Add the below parameters in /etc/gphd/hadoop/conf/hdfs-site.xml with appropriate value for the below on all the nodes.

  • /etc/security/hdfs.keytab - Path to the HDFS keytable file
  • REALM - Replace it with REALM as per your kerberos server, ex. SATURN.LOCAL
  • Use special variable _HOST, if "hostname -f" yields fully qualified name (FQDN), and kerberos principals are registered using FQDN. Else you need to put the hostname of the server which are used to create kerberos principals
<property>
  <name>dfs.journalnode.keytab.file</name>
  <value>/etc/security/hdfs.keytab</value>
</property>

<property>
  <name>dfs.journalnode.kerberos.principal</name>
  <value>hdfs/_HOST@REALM</value>
</property>

<property>
  <name>dfs.journalnode.kerberos.internal.spnego.principal</name>
  <value>HTTP/_HOST@REALM</value>
</property> 

After the above changes are made, restart the cluster, you should not see any errors due to QJM authentication issues.

Internal Comments

Section:

Note: This step is not mentioned in PHD 1.1.x Stack and Tool reference guide. It will be updated in the next release documentation.

  • Pivotal internal employees reference JIRA : HD-863

Comments

Powered by Zendesk