Pivotal Knowledge Base

Follow

On secure HDFS, HAWQ initialization may fail with "authentication failed: GSSAPI error in client while negotiating security context in gss_init_sec_context() in SASL library"

Problem:

On a secure HDFS cluster, HAWQ initialization may fail with an error message stating [WARN]:-Failed to create dfs filespace.

Reviewing further, hawq database logs under $MASTER_DATA_DIRECTORY/pg_log may reveal an error like below:

2014-01-03 21:16:36.293756 EST,,,p480707,th1440433952,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
E0103 21:16:36.293704 480943 Hdfs.cpp:27] authentication failed: GSSAPI error in client while negotiating security context in gss_init_sec_context() in SASL library.  This is most likely due insufficient credentials or malicious interactions.
    @     0x7f7e55692145  Hdfs::Internal::SaslClient::connect()
    @     0x7f7e55681b22  Hdfs::Internal::RpcChannel::connect()
    @     0x7f7e556866d5  Hdfs::Internal::RpcChannel::invoke()
    @     0x7f7e5568be7e  Hdfs::Internal::RpcClient::call()
    @     0x7f7e5568d2b6  Hdfs::Internal::Invoker::CallMethod()
    @     0x7f7e5569505a  Hdfs::Internal::ClientNamenodeProtocolTranslator::getFileInfo()
    @     0x7f7e555a2810 

Background:

Kerberos clients can do DNS lookups to canonicalize service principal names. This can cause difficulties when setting up Kerberos application servers, especially when the client’s name for the service is different from what the service thinks its name is. By default, kerberos client performs a reverse dns lookup, and if the retrieved hostname is different than the name with which the principals have been setup, such issues are observed.

In the below log snippet, refer to the hostname highlighted in red, which are the local hostname of the servers. However, principals were created at KDC using external hostname.

[373286] 1390440596.652217: Requesting tickets for host/hdw2.gphd.local@PHD.DEV.VSA.COM, referrals on
[373286] 1390440596.652263: Generated subkey for TGS request: aes256-cts/1F40
[373286] 1390440596.652360: etypes requested in TGS request: aes256-cts, aes128-cts, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
[373286] 1390440596.652557: Encoding request body and padata into FAST request
[373292] 1390440596.666230: Received answer from dgram 10.181.22.129:88
[373292] 1390440596.666282: Response was not from master KDC
[373292] 1390440596.666312: Decoding FAST response
[373292] 1390440596.666431: TGS request result: -1765328377/Server krbtgt/GPHD.LOCAL@PHD.DEV.VSA.COM not found in Kerberos database
[373292] 1390440596.667000: Convert service host (service with host as instance) on host hdw1 to principal
[373292] 1390440596.668733: Remote host after forward canonicalization: hdw1.gphd.local
[373292] 1390440596.669084: Remote host after reverse DNS processing: hdw1.gphd.local
[373292] 1390440596.669090: Get host realm for hdw1.gphd.local
[373292] 1390440596.669094: Use local host hdw1.gphd.local to get host realm
[373292] 1390440596.669097: Look up hdw1.gphd.local in the domain_realm map
[373292] 1390440596.669101: Look up .gphd.local in the domain_realm map
[373292] 1390440596.669104: Look up gphd.local in the domain_realm map
[373292] 1390440596.669108: Look up .local in the domain_realm map
[373292] 1390440596.669111: Look up local in the domain_realm map
[373292] 1390440596.669115: Got realm for host hdw1.gphd.local
[373292] 1390440596.669120: Got service principal host/hdw1.gphd.local@
[373292] 1390440596.669277: ccselect can't find appropriate cache for server principal host/hdw1.gphd.local@

Solution:

Note: To enable debugging, refer to the the article : Enable kerberos debugging logs with HAWQ. Log snippets in this article are from the logs used for kerberos client debugging messages.

In /etc/krb5.conf, append the below value to turn off reverse DNS lookup, & place the updated files on the master & segment nodes. Then, go ahead with initializing database again, it should succeed.

Note: Make sure you have deleted HAWQ Master & segment data directories created during last failed initialization attempt, else initialization will fail.

[libdefaults] 
    rdns = false 

Snippet from the below log message indicate external hostname (ex: s173vsainthdm01.domain.com) could now be retrieved. 

[625829] 1390449858.962265: Convert service hdfs (service with host as instance) on host sl73vsainthdm01q.domain.com to principal
[625829] 1390449858.963968: Remote host after forward canonicalization: sl73vsainthdm01q.domain.com
[625829] 1390449858.964029: Remote host after reverse DNS processing: sl73vsainthdm01q.domain.com
[625829] 1390449858.964048: Get host realm for sl73vsainthdm01q.domain.com
[625829] 1390449858.964068: Use local host sl73vsainthdm01q.domain.com to get host realm
[625829] 1390449858.964077: Look up sl73vsainthdm01q.domain.com in the domain_realm map
[625829] 1390449858.964087: Look up .domain.com in the domain_realm map
[625829] 1390449858.964096: Look up domain.com in the domain_realm map
[625829] 1390449858.964105: Look up .com in the domain_realm map
[625829] 1390449858.964143: Look up com in the domain_realm map
[625829] 1390449858.964155: Got realm for host sl73vsainthdm01q.domain.com
[625829] 1390449858.964186: Got service principal hdfs/sl73vsainthdm01q.domain.com@
[625829] 1390449858.964950: ccselect can't find appropriate cache for server principal hdfs/sl73vsainthdm01q.domain.com@
[625829] 1390449858.965170: Getting credentials postgres@PHD.DEV.VSA.COM -> hdfs/sl73vsainthdm01q.domain.com@ using ccache FILE:/tmp/postgres.ccname

Miscellaneous:

Comments

Powered by Zendesk