Pivotal Knowledge Base

Follow

icm_client deploy failed with "Could not start Service nmon"

Environment

  • PHD 2.0.1
  • PHD 2.1

Symptom

Deploying a PHD cluster with icm_client fails with the following log entry in file /tmp/GPHDNodeInstaller_xxxxxx.log

notice: nmon_conf_dir = /etc/nmon ; nmon_conf_subdir = /etc/nmon/conf ; nmon-site.xml path = /etc/nmon/conf/nmon-site.xml
notice: /Stage[main]/Mgmt_apps::Nmon/Notify[nmon_conf_dir = /etc/nmon ; nmon_conf_subdir = /etc/nmon/conf ; nmon-site.xml path = /etc/nmon/conf/nmon-site.xml]/message: defined 'message' as 'nmon_conf_dir = /etc/nmon ; nmon_conf_subdir = /etc/nmon/conf ; nmon-site.xml path = /etc/nmon/conf/nmon-site.xml'
debug: /Stage[main]/Mgmt_apps::Nmon/Notify[nmon_conf_dir = /etc/nmon ; nmon_conf_subdir = /etc/nmon/conf ; nmon-site.xml path = /etc/nmon/conf/nmon-site.xml]: The container Class[Mgmt_apps::Nmon] will propagate my refresh event
debug: Service[nmon](provider=redhat): Executing '/sbin/service nmon status'
debug: Service[nmon](provider=redhat): Executing '/sbin/service nmon start'
err: /Stage[main]/Mgmt_apps::Nmon/Service[nmon]/ensure: change from stopped to running failed: Could not start Service[nmon]: Execution of '/sbin/service nmon start' returned 1: at /var/lib/gphd/gphdmgr/puppet/modules/mgmt_apps/manifests/nmon.pp:47

Cause

nmon service does not start and status reports "unrecognized service"

[gpadmin@namenode-01 tmp]$ service nmon status
nmon: unrecognized service

nmon is a service bundled with Pivotal Command Center. During deployment of hadoop services a previous version of the nmon package is installed on the failed hosts 

[root@namenode-01 ~]# rpm -qa|grep nmon
nmon-14i-8.el6.x86_64

Hadoop deployment expects version "nmon-1.0.0-69.x86_64".  The incompatible version of nmon was  provided by a yum repository called local-epel

[root@namenode-01 ~]# yum list nmon
Installed Packages
nmon.x86_64 14i-8.el6 @local-epel

local-epel yum repository has a higher version of nmon so yum will use the local-epel version instead of the Pivotal Command Center gphd.repo version

[gpadmin@namenode-01 yum.repos.d]$ cat epel.repo 
[local-epel]
name = epel Local
enabled = 1
baseurl = http://10.47.121.215/repos/centos/6/x86_64/epel/
gpgcheck = 0

Fix

  1. Disable epel.repo on failed hosts
    [gpadmin@namenode-01 yum.repos.d]$ cat epel.repo 
    [local-epel]
    name = epel Local
    enabled = 0
    baseurl = http://10.47.121.215/repos/centos/6/x86_64/epel/
    gpgcheck = 0
  2. Make sure nmon package will be provided by correct gphd repository
    [gpadmin@namenode-01 ~]$ yum list nmon
    Installed Packages
    nmon.x86_64 1.0.0-69 @gphd
  3. Uninstall and deploy the cluster again
    [gpadmin@namenode-01 ~]$ icm_client uninstall -l <phd_cluster_name>
    [gpadmin@namenode-01 ~]$ icm_client deploy -c <config_dir>

Comments

Powered by Zendesk