Pivotal Knowledge Base

Follow

Workload Manager (WLM) unable to start after new install

Environment

Product Version
Workload Manager (WLM) 1.03
OS RHEL 6.x
Others  

Symptom

After installing WLM 1.03, WLM services fail to start with the following error.

Error Message:

"/usr/local/gp-wlm-data/.scratch/installer/install.sh: line 329: /usr/local/gp-wlm/bin/svc-mgr.sh: No such file or directory 
[Fri Apr 29 18:00:14 UTC 2016] Failure on Fri Apr 29 18:00:14 UTC 2016."

Cause

Symlink to the right WLM folder is not created during installation. All the WLM start/stop/status function use the symlink folder "gp-wlm"

RCA

All WLM start/stop/status functions use gp_wlm which is a symlink to the actual physical WLM install location

[gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032766-663e15f/bin]# ./svc-mgr.sh --service=all --action=cluster-status
./svc-mgr.sh: line 4: /home/gpadmin/gp-wlm/etc/init.d/functions: No such file or directory
./svc-mgr.sh: line 5: /home/gpadmin/gp-wlm/gp-wlm_path.sh: No such file or directory
./svc-mgr.sh: line 6: /home/gpadmin/gp-wlm/lib/bash.functions: No such file or directory
./svc-mgr.sh: line 240: _util_check: command not found
./svc-mgr.sh: line 111: /home/gpadmin/gp-wlm/var/lock/kraken-svc.lock: No such file or directory
[gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032766-663e15f/bin]# cd /home/gpadmin/gp-wlm
-bash: cd: /home/gpadmin/gp-wlm: No such file or directory

Resolution

High Level Steps

  • Find out the WLM installation folder (hint: usually it would be /gp-wlm-data/gp-wlm-1.03-<followed by random numbers-digits>)
  • Create a symlink for this installation directory. The name of the symlink has to be "gp-wlm" for the following query to work:
[gpadmin@mdwdi ~/]# cd gp-wlm-data
[gpadmin@mdwdi ~/gp-wlm-data]#
[gpadmin@mdwdi ~/gp-wlm-data]# cd gp-wlm-1.0.3-1455032760-663e15f
[gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f]# pwd
/home/gpadmin/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f [gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f]# ln -s /home/gpadmin/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f /home/gpadmin/gp-wlm
[gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f]# ls -ltrh /home/gpadmin/gp-wlm
lrwxrwxrwx 1 gpadmin gpadmin 57 Feb 11 13:18 /home/gpadmin/gp-wlm -> /home/gpadmin/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f
[gpadmin@mdwdi ~/gp-wlm-data/gp-wlm-1.0.3-1455032760-663e15f]# cd /home/gpadmin/gp-wlm
[gpadmin@mdwdi ~/gp-wlm]# ls -ltrh
total 36K
drwxr-xr-x 3 gpadmin gpadmin 4.0K Dec 9 15:43 usr
drwxr-xr-x 2 gpadmin gpadmin 4.0K Dec 9 15:53 sbin
drwxr-xr-x 3 gpadmin gpadmin 4.0K Dec 9 15:54 share
drwxr-xr-x 29 gpadmin gpadmin 4.0K Dec 9 15:54 lib
drwxr-xr-x 2 gpadmin gpadmin 4.0K Dec 9 15:54 schema
-rw-r--r-- 1 gpadmin gpadmin 965 Feb 9 10:46 gp-wlm_path.sh
drwxr-xr-x 14 gpadmin gpadmin 4.0K Feb 9 10:46 etc
drwxr-xr-x 2 gpadmin gpadmin 4.0K Feb 9 10:46 bin
drwxr-xr-x 7 gpadmin gpadmin 4.0K Feb 9 10:46 var
[gpadmin@mdwdi ~/gp-wlm]# cd bin [gpadmin@mdwdi ~/gp-wlm/bin]# ./svc-mgr.sh --service=all --action=cluster-status
mdwdi:
RabbitMQ is stopped
agent is stopped
cfgmon is stopped
rulesengine is stopped
svcmon is stopped [gpadmin@mdwdi ~/gp-wlm/bin]# ./svc-mgr.sh --service=all --action=cluster-start
mdwdi:
Starting RabbitMQ: [ OK ]
Starting agent: [ OK ]
Starting cfgmon: [ OK ]
Starting rulesengine: [ OK ]
Starting svcmon: [ OK ] [gpadmin@mdwdi ~/gp-wlm/bin]# ./svc-mgr.sh --service=all --action=cluster-status
mdwdi:
RabbitMQ is running out of the current installation. (PID=12595)
agent (pid 12837) is running...
cfgmon (pid 12911) is running...
rulesengine (pid 12946) is running...
svcmon (pid 12979) is running...
[gpadmin@mdwdi ~/gp-wlm/bin]#

Additional Information

 

Comments

Powered by Zendesk