Pivotal Knowledge Base

Follow

gpssh to hosts sometimes print messages "[ERROR] unable to login to <hostname>"

Problem

gpssh to the hosts sometimes fails to connect to some host and throws out the message "[ERROR] unable to login to <hostname>"

For Example

[gpadmin@mdw ~]$ gpssh -f ~/gpconfigs/hostfile "hostname"
[ERROR] unable to login to sdw2
[ERROR] unable to login to sdw8
hint: use gpssh-exkeys to setup public-key authentication between hosts
[sdw4] sdw4
[sdw5] sdw5
[sdw6] sdw6
[sdw7] sdw7
[sdw1] sdw1
[sdw3] sdw3
[smdw] smdw
[ mdw] mdw

Cause

gpssh basically runs a ssh command and has a defined login timeout value of 10 seconds, so it waits for 10 seconds for ssh to response , if it doesn't get responds then it terminate the connection and prints message as indicates above.

Here is a quick snippet of the python code, as you can see the gpssh initial task is to establish a ssh connection and then print "hello hello hello hello" to determine if it can receive a response.

[gpadmin@mdw ~]$ grep -A10 login_timeout /usr/local/greenplum-db/bin/lib/pxssh.py
    def loginAsync (self,server,username=None,login_timeout=10, port=None):
	cmd = 'ssh -o "BatchMode yes" -o "StrictHostKeyChecking no"'
        if port:
            cmd = cmd + ' -p %d' % port
        if username:
            cmd = cmd + ' -l %s' % username
        cmd = cmd + ' ' + server
        spawn.__init__(self, cmd, timeout=login_timeout)
	# we don't need this since we are not sending
	# password over (see comments in pexpect.py re: delaybeforesend)
	self.delaybeforesend = 0

    ### cktan: wait for login
    def loginWait(self, login_timeout=10, set_term_dumb=False):
        #, "(?i)no route to host"])
	echo = 'hello hello hello hello'
	self.sendline('echo ' + echo)
	exp = [echo, "(?i)permission denied", "(?i)terminal type",
	       TIMEOUT, "(?i)connection closed by remote host",
	       EOF]
	try:
	    i = self.expect(exp)
	    if i == 0:
		i = self.expect(exp)
--
    def login (self,server,username,login_timeout=10, port=22):
	self.loginAsync(server, username, login_timeout, port)
	return self.loginWait(login_timeout)

Solution

-- Retry the gpssh connection again.

-- Ensure if you are able to connect to those host using

ssh <hostname>

and there is no delay in the response or to return to the prompt , if there is a delay then you will need to check your DNS server.

-- Or Try to disable the DNS (as ssh try to resolve the hostname to IP) using

[root@mdw /tmp]# grep DNS /etc/ssh/sshd_config
UseDNS no

and setting the client address on the servers /etc/hosts file.

-- Or create a hostfile with ip address of the client , rather than hostname so that each ssh doesn't have to go through the name resolution process.

Internal Notes

MPP-26051

Comments

Powered by Zendesk