Pivotal Knowledge Base

Follow

GPPERFMON Reinstall: GPMMON cannot Establish a connection with the GPSMON Process

Environment

 Product  Version
 Pivotal Greenplum  4.3.8.x
 OS  RHEL 6.x

Symptom

Customers reinstalling gpperfmon, refer to Installing the Greenplum Command Center Software. Although, by installing gpperfmon, you cannot collect data from all the segments including the master.

Refer to the step 6 in the above doc. Run the following command to verify that the data collection processes are writing to the Command Center Database. If all of the segment data collection agents are running, you should see one row per segment host.

[gpadmin@autcbdgpdbm01 gpseg-1]$ psql gpperfmon -c 'SELECT * FROM system_now;' 
ctime | hostname | mem_total | mem_used | mem_actual_used | mem_actual_free | swap_total | swap_used | swap_page_in | swap_page_out | cpu_user | cpu_sys | cpu_idle | load0 | load1 | load2 | quantum | disk_ro_rate | disk_wo_rate | disk_rb_rate | disk_wb_rate | net_rp_rate | net_wp_rate | net_rb_rate | net_w b_rate -------+----------+-----------+----------+-----------------+-----------------+------------+-----------+--------------+---------------+----------+--------- +----------+-------+-------+-------+---------+--------------+--------------+--------------+--------------+-------------+-------------+-------------+------ -------
(0 rows)
[gpadmin@autcbdgpdbm01 ~]$ cd $MASTER_DATA_DIRECTORY/gpperfmon
[gpadmin@autcbdgpdbm01 gpperfmon]$ ll
total 8
drwx------ 2 gpadmin gpadmin 4096 Apr 11 15:33 data
drwx------ 2 gpadmin gpadmin 4096 Apr 11 15:34 logs
[gpadmin@autcbdgpdbm01 gpperfmon]$ cd data
[gpadmin@autcbdgpdbm01 data]$ ll | grep now
-rw------- 1 gpadmin gpadmin  26 Apr 11 15:33 database_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 diskspace_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 filerep_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 iterators_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 queries_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 segment_now.dat
-rw------- 1 gpadmin gpadmin   0 Apr 11 15:33 system_now.dat

Troubleshooting

Check if the gpsmon processes are up and running on all hosts:

=> ps -ef |grep gpsmon
[sdw7] gpadmin 307911 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg23/gpperfmon -v 0 8888
[sdw6] gpadmin 307895 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg15/gpperfmon -v 0 8888
[sdw8] gpadmin 307825 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg31/gpperfmon -v 0 8888
[sdw5] gpadmin 355623 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg7/gpperfmon -v 0 8888
[ mdw] gpadmin 397052 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l gpperfmon/logs -v 0 8888
[sdw2] gpadmin 310241 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg47/gpperfmon -v 0 8888
[sdw1] gpadmin 310109 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg39/gpperfmon -v 0 8888
[sdw3] gpadmin 310005 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg55/gpperfmon -v 0 8888
[sdw4] gpadmin 309956 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l /data2/primary/gpseg63/gpperfmon -v 0 8888
[smdw] gpadmin 228959 1 0 12:11 ? 00:00:00 /usr/local/greenplum-db/./bin/gpsmon -m 0 -i -t 0 -l gpperfmon/logs -v 0 8888

Check gpsmon.2017.04.11_073319.log on the master:

2017-04-11 15:33:19|:-LOG: HOSTNAME = 'autcbdgpdbm01'
2017-04-11 15:33:19|:-FATAL: [INTERNAL ERROR gpsmon.c:2125] unable to bind udp socket
    error 98 (Address already in use)
    ... exiting

Check gpmmon.2017.04.11_072751.log on the master:

2017-04-11 15:27:54|:-LOG: Making initial connection to sdw1
2017-04-11 15:27:54|:-LOG: Making initial connection to sdw2
2017-04-11 15:27:54|:-LOG: Making initial connection to sdw3
2017-04-11 15:27:54|:-LOG: Making initial connection to sdw4
...
2017-04-11 15:28:10|:-LOG: Connection to sdw1 lost.  Restarting gpsmon.
2017-04-11 15:28:10|:-LOG: Connection to sdw2 lost.  Restarting gpsmon.
2017-04-11 15:28:10|:-LOG: Connection to sdw3 lost.  Restarting gpsmon.
2017-04-11 15:28:10|:-LOG: Connection to sdw4 lost.  Restarting gpsmon.
...

Investigate the gpsmon and gpmmon connections. If it seems that no connections are established between gpsmon and gpmmon, try to telnet sdw1 from mdw (command: telnet sdw1 8888) and the telnet connection will be established successfully.

=> netstat -anp |grep 8888
sdw7
[sdw7] tcp 0 0 :::8888 :::* LISTEN 307911/gpsmon
[sdw7] udp 0 0 :::8888 :::* 307911/gpsmon

sdw6
[sdw6] tcp 0 0 :::8888 :::* LISTEN 307895/gpsmon
[sdw6] udp 0 0 :::8888 :::* 307895/gpsmon

sdw8
[sdw8] tcp 0 0 :::8888 :::* LISTEN 307825/gpsmon
[sdw8] udp 0 0 :::8888 :::* 307825/gpsmon

sdw5
[sdw5] tcp 0 0 :::8888 :::* LISTEN 355623/gpsmon
[sdw5] udp 0 0 :::8888 :::* 355623/gpsmon

mdw
[ mdw] tcp 0 0 172.28.8.250:44093 172.28.8.1:8888 ESTABLISHED 396674/gpmmon
[ mdw] tcp 0 0 172.28.8.250:43910 172.28.8.1:8888 ESTABLISHED 400573/telnet
[ mdw] tcp 0 0 :::8888 :::* LISTEN 397052/gpsmon
[ mdw] udp 0 0 :::8888 :::* 397052/gpsmon

sdw2
[sdw2] tcp 0 0 :::8888 :::* LISTEN 310241/gpsmon
[sdw2] udp 0 0 :::8888 :::* 310241/gpsmon

sdw1
[sdw1] tcp 0 0 :::8888 :::* LISTEN 310109/gpsmon
[sdw1] tcp 360 0 ::ffff:172.28.8.1:8888 ::ffff:172.28.8.250:44093 ESTABLISHED -
[sdw1] tcp 0 0 ::ffff:172.28.8.1:8888 ::ffff:172.28.8.250:43910 ESTABLISHED 310109/gpsmon
[sdw1] udp 124992 0 :::8888 :::* 310109/gpsmon

sdw3
[sdw3] tcp 0 0 :::8888 :::* LISTEN 310005/gpsmon
[sdw3] udp 0 0 :::8888 :::* 310005/gpsmon

sdw4
[sdw4] tcp 0 0 :::8888 :::* LISTEN 309956/gpsmon
[sdw4] udp 0 0 :::8888 :::* 309956/gpsmon

smdw
[smdw] tcp 0 0 :::8888 :::* LISTEN 228959/gpsmon
[smdw] udp 0 0 :::8888 :::* 228959/gpsmon

Resolution

Check $MASTER_DATA_DIRECTORY/gpperfmon on master for the "/conf" folder. If the folder doesn't exist, create one and upload the gpperfmon.conf (attached below).

[gpadmin@gpdb-sandbox gpperfmon]$ cd conf
[gpadmin@gpdb-sandbox conf]$ ll
total 4
-rw-rw-r-- 1 gpadmin gpadmin 2350 Aug 26 2016 gpperfmon.conf

Additional Information

This issue occurrs whe the steps in Uninstalling Greenplum Command Center are not followed to uninstall gpperfmon and delete the conf directory.

gpperfmon.conf attachment:

Comments

Powered by Zendesk