Pivotal Knowledge Base

Follow

Flood of "System Board FAN MOD XX RPM Status:nonCritical" in /var/log/messages

Scott Gai

Environment

DCA 1.2.2.1

Problem

It's observed on several DCA V1 system either installed with or upgraded to 1.2.2.1 that a flood of the following messages are seen in system logs.

Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 1A RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 1B RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 2A RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 2B RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 3A RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 3B RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 4A RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 4B RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 5B RPM Status:nonCritical
Jun 23 03:09:14 mdw-ext1 healthmon_worker: System Board FAN MOD 5A RPM Status:nonCritical

... ...

But omreport shows all fans on the server are in good status.

[root@emltgm01 snmp]# omreport chassis fans |grep Status
Redundancy Status : Full
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok
Status : Ok

Also snmpwalk command gets status 3 (Normal) for all fan devices.

[root@emltgm01 snmp]# snmpwalk -c public mdw:161 -v2c 1.3.6.1.4.1.674.10892.1.700.12.1.5.1
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.1 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.2 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.3 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.4 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.5 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.6 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.7 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.8 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.9 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.10 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.11 = INTEGER: 3
SNMPv2-SMI::enterprises.674.10892.1.700.12.1.5.1.12 = INTEGER: 3

Check file snmp.host.<hostname>.txt and could find the same messages but with normal status.

2014-06-23 03:35:27|mdw|5|4001|Cooling Device 1 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.1|normal|nonCritical: System Board FAN MOD 1A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 2 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.2|normal|nonCritical: System Board FAN MOD 2A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 3 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.3|normal|nonCritical: System Board FAN MOD 3A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 4 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.4|normal|nonCritical: System Board FAN MOD 4A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 5 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.5|normal|nonCritical: System Board FAN MOD 5A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 6 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.6|normal|nonCritical: System Board FAN MOD 6A RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 7 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.7|normal|nonCritical: System Board FAN MOD 1B RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 8 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.8|normal|nonCritical: System Board FAN MOD 2B RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 9 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.9|normal|nonCritical: System Board FAN MOD 3B RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 10 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.10|normal|nonCritical: System Board FAN MOD 4B RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 11 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.11|normal|nonCritical: System Board FAN MOD 5B RPM
2014-06-23 03:35:27|mdw|5|4001|Cooling Device 12 Status|.1.3.6.1.4.1.674.10892.1.700.12.1.5.1.12|normal|nonCritical: System Board FAN MOD 6B RPM

NOTE: snmp.host.<hostname>.txt is a file storing snmp evnets for the specific host in cluster. It's location could be checked out from content of /opt/dca/var/healthmond/active_healthmon_details on master host.

Cause

This is caused by a minor bug in DCA s/w version 1.2.2.1, actually it does not mean a real issue with cooling devices. 

Solution

The flooding messages in system logs could be ignored. It will be fixed in next DCA V1 release.

Miscellaneous

Pivotal engineer could refer to internal JIRA DCA-8797

Comments

Powered by Zendesk