Pivotal Knowledge Base

Follow

Segment Fails to Start due to File Replication Subprocess Generating SIGSEGV

Environment

Product Version
Pivotal Greenplum (GPDB) All versions

Symptoms

  • Segment does not start due to File Replication Subprocess Generating SIGSEGV
  • Segment log file contains the following:

      2015-03-16 07:15:03.855579 EDT,,,p6932,th95964576,,,,0,,,seg-1,,,,,"LOG","00000","
      Unexpected internal error: a file replication subprocess 6932 received signal SIGSEGV (interrupt holdoff count 2, critical section count 0)",,,,,,,0,,,,
      2015-03-16 07:15:03.905033 EDT,,,p6932,th95964576,,,,0,,,seg-1,,,,,"FATAL","XX000","Unexpected internal error: a file replication subprocess received signal SIGSEGV (elog.c:4492)","(interrupt holdoff count 2, critical section count 0)",,,,,,0,,"elog.c",4492,"Stack trace:
      1 0xa6ef19 postgres <symbol not found> (elog.c:468)
      2 0xa72b6c postgres StandardHandlerForSigillSigsegvSigbus_OnMainThread (elog.c:4486)
      3 0x329b6302d0 libc.so.6 <symbol not found> (??:0)
      4 0xc1a3fd postgres ChangeTracking_GetLastChangeTrackingLogEndLoc (cdbresynchronizechangetracking.c:3375)
      5 0x51b759 postgres XLogAddRecordsToChangeTracking (xlog.c:11109)
      6 0xc19665 postgres ChangeTracking_CreateInitialFromPreviousCheckpoint (cdbresynchronizechangetracking.c:206)
      7 0x51bcf6 postgres XLogInChangeTrackingTransition (xlog.c:5343)
      8 0xb6634a postgres FileRepPrimary_StartRecovery (cdbfilerepprimaryrecovery.c:288)
      9 0xb65ec5 postgres FileRepSubProcess_Main (cdbfilerepservice.c:813)
      10 0xb5c94a postgres <symbol not found> (cdbfilerep.c:2608)
      11 0xb645ea postgres FileRep_Main (cdbfilerep.c:3449)
      12 0x5970e9 postgres AuxiliaryProcessMain (bootstrap.c:487)
      13 0x873304 postgres <symbol not found> (postmaster.c:7606)
      14 0x8734ec postgres StartFilerepProcesses (postmaster.c:1648)
      15 0x88541c postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1743)
      16 0x87ee51 postgres PostmasterMain (postmaster.c:2510)
      17 0x78289a postgres main (main.c:212)
      18 0x329b61d994 libc.so.6 __libc_start_main (??:0)
      19 0x47d069 postgres <symbol not found> (??:0)
    

Cause

When starting the system, the file replication service will try to start and parse the existing change tracking files. If these files are corrupt, the file replication service fails and this failure cause the segment to fail to start.

Resolution

In this case, the solution is to:

  1. Stop the database cluster.
  2. Remove all the files from under <segment_dir>/pg_changetracking
  3. Restart the cluster.
  4. Run gprecoverseg -F to recover the mirror.

Note that after removing the files under pg_changetracking, it is not possible to run an incremental recovery for the mirror.

Comments

Powered by Zendesk