Pivotal Knowledge Base

Follow

icm_client command not really executed although completed with success

Environment

  • PHD 1.x
  • PHD 2.x

Symptom

icm_client command is run and returns SUCCESS for all hosts. However it doesn't complete any tasks at all as shown below.
"icm_client stop" is executed to stop the cluster and it completes successfully.

-bash-4.1$ icm_client stop -l phd11
Stopping services
Stopping cluster
[====================================================================================================] 100%
Results:
hdw1... [Success]
hdm1... [Success]
hdw3... [Success]
hdw2... [Success]
Details at /var/log/gphd/gphdmgr/

But the Hadoop services are still running on all hosts.

[root@admin gpadmin]# massh hostfile-phd verbose "/usr/java/default/bin/jps | grep -v Jps |grep -v RunJar"
hdw3 : 3386 QuorumPeerMain
hdw3 : 3460 DataNode
hdw3 : 3806 NodeManager
hdw3 : 3654 HRegionServer
hdw2 : 3457 DataNode
hdw2 : 3803 NodeManager
hdw2 : 3651 HRegionServer
hdw2 : 3383 QuorumPeerMain
hdw1 : 3705 SecondaryNameNode
hdw1 : 3949 NodeManager
hdw1 : 3797 HRegionServer
hdw1 : 3434 QuorumPeerMain
hdw1 : 3514 DataNode
hdm1 : 21129 ResourceManager
hdm1 : 21400 JobHistoryServer
hdm1 : 289624 NameNode
hdm1 : 20962 HMaster

Cause
The root cause of this issue is that gpadmin user does not have writable permission to file /tmp/.massh-gpadmin at the time of execution of icm_client. This could be due to several reasons.

1. Owner of .mash-gpadmin is not set to gpadmin:gpadmin, as shown below

[root@admin tmp]# ls -al |grep .massh-gpadmin
drwxrwxr-x 2 500 500 4096 May 15 23:03 .massh-gpadmin

500 seems to be UID of gpadmin, but actually it's not.

[root@admin tmp]# id gpadmin
uid=501(gpadmin) gid=501(gpadmin) groups=501(gpadmin)

This may be because sometimes the system administrator removes gpadmin and then add it back after adding another new user first, the UID of gpadmin will not be same as before. Or if administrator changes UID manually as shown below.

[root@admin tmp]# usermod -u 501 gpadmin
[root@admin tmp]# groupmod -g 501 gpadmin
[gpadmin@admin ~]$ id
uid=501(gpadmin) gid=501(gpadmin) groups=501(gpadmin)
[gpadmin@admin tmp]$ ls -alrt -rw-rw-r--  1  500  500    0 Jul 17 17:10 .massh-gpadmin

2. If user does su to root without "-" then shell will login as root user but will not set $USER to root

[gpadmin@admin ~]$ su
Password:
[root@admin gpadmin]# env | egrep -i user
USER=gpadmin [root@admin gpadmin]# ls -la /tmp | egrep -i massh
drwxr-xr-x 2 root root 4096 Jul 16 21:36 .massh-gpadmin

Fix

Change owner of /tmp/.massh-gpadmin to gpadmin:gpadmin or run "su -" depends on real situation of the problem.

[root@admin tmp]# chown gpadmin:gpadmin .massh-gpadmin
[root@admin tmp]# ls -al|grep massh-gpadmin
drwxrwxr-x 2 gpadmin gpadmin 4096 May 15 23:03 .massh-gpadmin

-bash-4.1$ icm_client stop -l phd11

Stopping services
Stopping cluster
[====================================================================================================] 100%
Results:
hdw1... [Success]
hdw3... [Success]
hdm1... [Success]
hdw2... [Success]
Details at /var/log/gphd/gphdmgr/

[root@admin tmp]# massh hostfile-phd verbose "/usr/java/default/bin/jps | grep -v Jps |grep -v RunJar"

[root@admin tmp]#

Comments

Powered by Zendesk