Pivotal Knowledge Base

Follow

Mirror instance fails with error "could not match message header checksum between primary"

Environment

Product Version
Pivotal Greenplum (GPDB) 4.2 and 4.3

Symptom

Mirror failed, it's pg log shows:
WARNING","01000","mirror failure, could not match message header checksum between primary 'xxxx' and mirror 'xxxxx', failover requested","identifier 'base/xxxx/xxxx' operation 'open' relation type 'buffer pool' message count 'xxxxx'","run gprecoverseg to re-establish mirror connectivity",,,"mirroring role 'mirror role' mirroring state 'sync' segment state 'up and running' process name(pid) 'mirror consumer writer process(xxxx)' filerep state 'up and running' 

Cause
There is a possible network issue between the primary and mirror instance.  

Resolution
1. Check if primary and mirror node has a stable network (drop/error packet from netstat -i). If there is a network issue, engage the related hardware vendor to fix it.

2. Once network issue fixed, run normal gprecoverseg for this mirror instance.

Notes

Here is some detailed explanation from Dev team (MPP-15613):

Filerep is proactively checking that data received on mirror are the same as send from primary. If checksum is not matching then failover is requested. The reason for proactive checking is that we cannot rely on switches, TCP to deliver the same data on mirror as sent from primary.
Description of how filerep checksum work
On Primary message is generated that contains Header, Header Checksum and Data
'Data Checksum' is calculated on data,
'Data Checksum' is stored in Header
'Header Checksum' is calculated on Header
Message is sent from filerep to kernel, TCP and to mirror
Mirror calculates checksum on Header and verifies that it matches Header Checksum (received from primary)
Mirror calculates checksum on Data and verifies that it matches Data Checksum (received from primary)

Comments

Powered by Zendesk