Pivotal Knowledge Base

Follow

GPRECOVERSEG Fails due to X11 Forwarding

Environment

 Product  Version
 Pivotal Greenplum   4.3.x
 OS  RHEL 6.x

Symptom

gprecoverseg utilizes gp_primarymirror to check the primary and the mirror segment states in order to determine if the segments are "Ready" for recovery.

Example: 

20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized

gprecoverseg expects the value to be returned in stderr and in a specific format, where it parses the necessary values out.

If gprecoverseg encounters additional information returned by SSH, it may assume that the segment is not "Ready" and retry the operation several times before finally terminating it.

Cause 

In this scenario, some X11 forwarding changes had been made to the gpadmin profile on several segments, causing various errors to be returned:

- expected response -
20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='mode: PrimarySegment
...

- error message 1 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker7] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb02 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb02 -p 50003"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='/usr/bin/xauth: error in locking authority file /home/gpadmin/.Xauthority

- error message 2 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker6] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40002"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='Warning: No xauth data; using fake authentication data for X11 forwarding.

- exception -
20161005:10:42:06:062639 gprecoverseg:ecdlnjqgrpms01:gpadmin-[ERROR]:-gprecoverseg failed. exiting...
Traceback (most recent call last):
File "/bb/gpdata/greenplum-db/lib/python/gppylib/mainUtils.py", line 281, in simple_main_locked
exitCode = commandObject.run()
File "/bb/gpdata/greenplum-db/lib/python/gppylib/programs/clsRecoverSegment.py", line 1266, in run
raise Exception("Inconsistency in catalog and segment Role/Mode. Catalog Role = %s. Segment Mode = %s." % (db.getSegmentRole(), mode))
Exception: Inconsistency in catalog and segment Role/Mode. Catalog Role = p. Segment Mode = error in locking authority file /home/gpadmin/.Xauthority.

Resolution

For this issue, modify the SSH config to disable X11 via ~/.ssh/config:

Host *
ForwardAgent no
ForwardX11 no

Additional Information 

This issue can be seen with multiple facets:

 

 

 

 

Comments

Powered by Zendesk