Pivotal Knowledge Base

Follow

No health information is shown on the GPCC web console “Dashboard” and “Health” tabs

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x
Others  DCA

Symptom

After logging on to GPCC web console, nothing is shown about health on the "Dashboard" and "Health" tabs as illustrated by following pictures:

The issue was found on a DCA V2 platform.

 

Error message was seen in GPCC instance log file - gpmonws.log

2016-07-06 14:43:00,728 - could not open health file: /data/master/gpseg-1/gpperfmon/data/snmp/hostlistreport.txt

And no data files exist under specified data directory.

[gpadmin@mdw snmp]$ pwd
/data/master/gpseg-1/gpperfmon/data/snmp
[gpadmin@mdw snmp]$ ls
[gpadmin@mdw snmp]$

Cause

Data files contain system health data that doesn't exist under $MASTER_DATA_DIRECTGORY/gpperfmon/data/snmp

RCA

The system data health data files will be generated by healthmon damemon (healmond) on DCA platform. 

healthmond will run the query below and use the values from row 0 and column 0 as the master data directory of GPDB cluster.

select fselocation from pg_filespace_entry, gp_segment_configuration where fsedbid = dbid and content = -1 and dbid = 1

However, in this case, customer configured 2 file spaces in GPDB cluster.

gpadmin=# select oid, * from pg_filespace;
oid | fsname | fsowner
----------+-----------+---------
3052 | pg_system | 10
31689307 | fs_test | 10 gpadmin=# select fselocation from pg_filespace_entry, gp_segment_configuration where fsedbid = dbid and content = -1 and dbid = 1;
fselocation
------------------------------
/data/fs_test/master/gpseg-1
/data/master/gpseg-1

It's fselocation of non-default filespace (fs_test) that was wrongly used as the master data directory. Therefore, system health data files were written to a place other than the master data directory, from where the GPCC web application reads the health data.

Command showing healthmond is writing data to the location filespace fs_test.

[root@mdw snmp]# lsof|grep hostlist
healthmon 731289 root 8w REG 8,64 138 939785346 /data/fs_test/master/gpseg-1/gpperfmon/data/snmp/_hostlistreport.txt

And health data files were really generated under the location filespace fs_test.

[root@mdw snmp]# pwd
/data/fs_test/master/gpseg-1/gpperfmon/data/snmp [root@mdw snmp]# ls -lrt
total 184
-rw-r--r-- 1 root root 625 Oct 4 13:38 snmp.host.db.txt
-rw-r--r-- 1 root root 17217 Oct 4 13:38 snmp.host.a-sw-1.txt
-rw-r--r-- 1 root root 19985 Oct 4 13:38 snmp.host.i-sw-1.txt
-rw-r--r-- 1 root root 19985 Oct 4 13:38 snmp.host.i-sw-2.txt
-rw-r--r-- 1 root root 14743 Oct 4 13:38 snmp.host.mdw.txt
-rw-r--r-- 1 root root 14875 Oct 4 13:38 snmp.host.smdw.txt
-rw-r--r-- 1 root root 19488 Oct 4 13:38 snmp.host.sdw1.txt
-rw-r--r-- 1 root root 19491 Oct 4 13:38 snmp.host.sdw2.txt
-rw-r--r-- 1 root root 19492 Oct 4 13:38 snmp.host.sdw3.txt
-rw-r--r-- 1 root root 19485 Oct 4 13:38 snmp.host.sdw4.txt
-rw-r--r-- 1 root root 267 Oct 4 13:38 hostlistreport.txt
-rw-r--r-- 1 root root 22 Oct 4 13:38 lastreport.txt
[root@mdw snmp]# date
Tue Oct 4 13:38:44 CST 2016

Resolution

The issue only exists when GPCC is running on the DCA platform, as healthmond should get the correct master data directory even if there are multiple file spaces configured in the system. The final fix should be done at the DCA software side.

Before the fix is ready in healthmond, a workaround is to make a symbolic link under master data directory pointing to the location where health data files are generated. 

[gpadmin@mdw data]$ pwd
/data/master/gpseg-1/gpperfmon/data
[gpadmin@mdw data]$ ls -l snmp
lrwxrwxrwx 1 gpadmin gpadmin 48 Oct 4 14:10 snmp -> /data/fs_test/master/gpseg-1/gpperfmon/data/snmp
[gpadmin@mdw data]$ ls -l snmp/
total 184
-rw-r--r-- 1 root root 267 Oct 4 14:10 hostlistreport.txt
-rw-r--r-- 1 root root 22 Oct 4 14:10 lastreport.txt
-rw-r--r-- 1 root root 17217 Oct 4 14:10 snmp.host.a-sw-1.txt
-rw-r--r-- 1 root root 625 Oct 4 14:10 snmp.host.db.txt
-rw-r--r-- 1 root root 19985 Oct 4 14:10 snmp.host.i-sw-1.txt
-rw-r--r-- 1 root root 19985 Oct 4 14:10 snmp.host.i-sw-2.txt
-rw-r--r-- 1 root root 14741 Oct 4 14:10 snmp.host.mdw.txt
-rw-r--r-- 1 root root 19490 Oct 4 14:10 snmp.host.sdw1.txt
-rw-r--r-- 1 root root 19490 Oct 4 14:10 snmp.host.sdw2.txt
-rw-r--r-- 1 root root 19489 Oct 4 14:10 snmp.host.sdw3.txt
-rw-r--r-- 1 root root 19492 Oct 4 14:10 snmp.host.sdw4.txt
-rw-r--r-- 1 root root 14878 Oct 4 14:10 snmp.host.smdw.txt

  

Comments

Powered by Zendesk