Pivotal Knowledge Base

Follow

WLM Installation Failure Checklist

Environment

 Product  Version
 Pivotal Greenplum  4.3.x
 OS

 RHEL 5.x/6.x; SUSE 11

 WLM  1.7.0, 1.7.1, 1.7.2

Purpose

This article describes what to look for when WLM installation fails.

Checklist

Installation Issues

  • Leftover scratch dir
    • Error message:
getopt: unrecognized option `--func-file=<install-dir>/gp-wlm-data/.scratch/installer/installer.functions'
    • Solution:
      • Make a copy of the scratch before removing (See directory below)
      • Delete scratch before installation
<install-dir>/gp-wlm-data/.scratch
    • Can also be found at:
      http://gpcc.docs.pivotal.io/300/gpcc/relnotes/GPCC-302-release-notes.html#topic_srn_jjj_b5 
  • Segments unable to ssh to Master’s $HOSTNAME
    • Error message:
Services took too long to come online; cluster is not healthy

    • Solution:
      • Must be able to SSH to Master's $HOSTNAME from each host in gp_segment_configuration.
      • SSH `hostname -s` & ssh `hostname -f` should work from all hosts in gp_segment_configuraton to itself (Including MDW and SMDW).
  • Does any segment has a different hostname than gp_segment_configuration?
    • Error message:
Some hosts could not connect; cluster is not healthy
    • Solution (2 options):
      • Option 1: Change node hostname according to gp_segment_configuration.
      • Option 2: Run below steps on failed node.
cd /home/gpadmin/gp-wlm/etc/rabbitmq/current
ln -sf smdw/rabbitmq.config rabbitmq.config
  • If number of nodes is > 40 and master having different entry in gp_segment_configuration
    • Error message:
Unable to locate host information for host <hostname>
    • Solution:
      • Make gp_segment_configuration & hostname the same for master
  • Previous uninstall not clean
    • Error Message (Not all times this would be an unclean uninstall):
Unable to find hostname in existing bootstrap database
    • Solution:
      • Cleanup previous install and check no existing services are running by following below steps
gpssh -f hostfile (make sure hostfile has both mdw and smdw in it)
ps -ef | grep wlm on each host
killall <process-name> (For processes returned above)
Backup <install-dir>/gp-wlm-data/<timestamp>/ for mdw, failed node and one successful node
cd <wlm-install-dir>
rm -rf gp-wlm-data
rm -rf gp-wlm

RCA Log Collection (Collect below files from master, failed nodes and any one not failed node)

<install-dir>/gp-wlm-data/<timestamp>/*
echo $HOSTNAME
hostname -s
hostname -f
/etc/hosts
Gather_cluster_logs (Might sometimes not work)
http://gpcc.docs.pivotal.io/300/gp-wlm/topics/troubleshooting.html
gp_segment_configuration (Only on master)
<install_dir>/gp-wlm/sbin/rabbitmqctl cluster_status

Comments

  • Avatar
    Scott Gai

    For error "Unable to locate host information for host ", "hostname -f" output on each host (master and segments) should match short name of hostname field in gp_segment_configuration.
    Besides updating gp_segment_configuration, alternative solution is to update /etc/hosts to make them match. The first name in each entry of /etc/hosts will be output of "hostname -f". For example,

    [gpadmin@f12agpdb01 ~]$ hostname -s
    f12agpdb01
    [gpadmin@f12agpdb01 ~]$ hostname -f
    mdw-1
    [gpadmin@f12agpdb01 ~]$ hostname --long
    mdw-1

    [gpadmin@f12agpdb01 ~]$ grep mdw-1 /etc/hosts
    10.11.251.250 mdw-1 mdw f12agpdb01.umc.com F12AGPDB01

     

    Refer to ticket #55284

    Edited by Scott Gai
  • Avatar
    Scott Gai

    encountered another issue with WLM installation and summarized it with a new KB article.
    https://discuss.pivotal.io/hc/en-us/articles/115007803988

    Just FYI

Powered by Zendesk