Pivotal Knowledge Base

Follow

Work Load Manager installation fails when starting the RabbitMQ service

Environment

 Product  Version
 Pivotal Greenplum  4.3.x
 OS  RHEL 6.x
 Workload Manager (WLM)  1.7.0

Symptom

When attempting an installation of WLM 1.7.0, it fails when starting up RabbitMQ services.

Error Message:

[gpadmin@mdw greenplum-cc-web-3.0.2]$ ./gp-wlm-1.7.0.bin --install=/home/gpadmin
[Thu May 18 22:10:47 UTC 2017] Installation started.
[Thu May 18 22:10:47 UTC 2017] No previous installation detected; performing first-time install of version 1.7.0.
[Thu May 18 22:10:47 UTC 2017] Decompressing archive.
[Thu May 18 22:10:48 UTC 2017] Configuring components (stage 1)
[Thu May 18 22:10:48 UTC 2017] Configuring cfgmon plugin gpdb_clustermon.so
[Thu May 18 22:10:48 UTC 2017] Configuring kraken-bootstrap
[Thu May 18 22:10:49 UTC 2017] Configuring rabbitmq
[Thu May 18 22:10:49 UTC 2017] Configuring rulesengine
[Thu May 18 22:10:49 UTC 2017] Configuring components (stage 2)
[Thu May 18 22:10:49 UTC 2017] Configuring agent plugin gpdb_record.so
[Thu May 18 22:10:49 UTC 2017] Configuring agent plugin gpdb_throttle.so
[Thu May 18 22:10:50 UTC 2017] Configuring cfgmon
[Thu May 18 22:10:50 UTC 2017] Configuring cfgmon plugin gpdb_clustermon.so
[Thu May 18 22:10:51 UTC 2017] Configuring gptop
[Thu May 18 22:10:51 UTC 2017] Configuring kraken-bootstrap
[Thu May 18 22:10:51 UTC 2017] Configuring rabbitmq
[Thu May 18 22:10:52 UTC 2017] Configuring rulesengine
[Thu May 18 22:10:52 UTC 2017] Ensuring services are stopped.
[Thu May 18 22:10:59 UTC 2017] Preparing cluster for software installation.
[Thu May 18 22:11:00 UTC 2017] Installing software on 9 target hosts.
[Thu May 18 22:11:00 UTC 2017] waiting running finished
[Thu May 18 22:11:00 UTC 2017] 00 09 00
[Thu May 18 22:11:10 UTC 2017] 00 04 05
[Thu May 18 22:11:12 UTC 2017] 00 00 09
[Thu May 18 22:11:12 UTC 2017] Updating configuration files.
Starting RabbitMQ: [FAILED] Reason: Unable to query service status. Try svc-mgr.sh --action=restart --service=rabbitmq
[Thu May 18 22:13:19 UTC 2017] Failure on Thu May 18 22:13:19 UTC 2017.

Manually run the suggested command to restart RabbitMQ and fails again.

[gpadmin@mdw bin]$ ./svc-mgr.sh --action=restart --service=rabbitmq
Starting RabbitMQ: [FAILED]

Cause 

The reserved port for RabbitMQ is being used by the Greenplum postgres process.

RCA

According to the RabbitMQ logs, the startup fails when listening on some TCP ports. 

Thu May 18 22:11:13 EDT 2017: --- Starting RabbitMQ ---
{error_logger,{{2017,5,18},{22,11,13}},"Protocol: ~tp: register/listen error: ~tp~n",["inet_tcp",econnrefused]}

When RabbitMQ service is running it will listen on TCP port 25672 and 7777, as shown below:

tcp        0      0 0.0.0.0:25672               0.0.0.0:*                   LISTEN      3759/beam.smp       
tcp        0      0 :::7777                     :::*                        LISTEN      3759/beam.smp

However, in this case, it's found that the TCP port 25672 is being used by one Greenplum backend postgres process 59445. That's why RabbitMQ fails to listen on the port 25672 during startup.

[gpadmin@mdw bin]$ netstat -anp|grep 5672
tcp        0      0 10.10.20.2:25672            10.10.20.16:40001           ESTABLISHED 59445/postgres
[gpadmin@mdw bin]$ ps -ef|grep 59445
gpadmin  59445 33573  0 16:04 ?        00:00:10 postgres: port  5432, i52045 prod_propcas 172.18.202.116(55935) con556598 172.18.202.116(55935) cmd1130 idle

Resolution

Follow these steps to resolve this issue:

  1. Terminate the idle session con556598 with pg_cancel_backend or pg_terminate_backend function to clean up the postgres process which is taking up the port 25672. If it's some other application using the reserved port, stop the application or manually kill the process with the kill command to release the port
  2. Make sure port 25672 and 7777 are not used by any running process on all hosts in the Greenplum cluster
  3. Uninstall/install WLM again

 

 

Comments

Powered by Zendesk