Pivotal Knowledge Base

Follow

Postgres core created by writer process after executing gpstop -M fast

Environment

Known to affect GPDB 4.3.9.1

Symptom

After executing 'gpstop -M fast' a postgres core is generated by the writer process. The database shutdown does complete.

Error Message:

Postgres core is generated with the following backtrace:

 Core was generated by `postgres: port 16729, writer process '.

Program terminated with signal SIGABRT, Aborted.
#0  0x00002b5307d52495 in raise () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6
(gdb) thread apply all bt

Thread 1 (LWP 86993):
#0  0x00002b5307d52495 in raise () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6
#1  0x00002b5307d53c75 in abort () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6
#2  0x0000000000b05381 in errfinish (dummy=<optimized out>) at elog.c:689
#3  0x0000000000b06e39 in elog_finish (elevel=<optimized out>, fmt=<optimized out>) at elog.c:1466
#4  0x000000000095f919 in proc_exit_prepare (code=<optimized out>) at ipc.c:155
#5  proc_exit (code=0) at ipc.c:95
#6  0x0000000000c06560 in FileRepPrimary_IsMirroringRequired (fileRepRelationType=FileRepRelationTypeFlatFile, fileRepOperation=FileRepOperationWrite)
    at cdbfilerepprimary.c:253
#7  0x0000000000c06c4f in FileRepPrimary_MirrorWrite (fileRepIdentifier=..., fileRepRelationType=86993, offset=42958848,
    data=0x6 <error: Cannot access memory at address 0x6>, dataLength=4294967295, lsn=...) at cdbfilerepprimary.c:863
#8  0x0000000000c575aa in MirroredFlatFile_Write (open=0x1223b60 <mirroredLogFileOpen>, position=42958848, buffer=<optimized out>, bufferLen=32768,
    suppressError=<optimized out>) at cdbmirroredflatfile.c:648
#9  0x000000000055559a in XLogWrite (WriteRqst=..., flexible=<optimized out>, xlog_switch=<optimized out>) at xlog.c:2184
#10 0x0000000000556b59 in XLogFlush (record=...) at xlog.c:2406
#11 0x0000000000943ef3 in FlushBuffer (buf=0x2b531020bc20, reln=0x31ecff8) at bufmgr.c:2397
#12 0x0000000000946e5f in SyncOneBuffer (skip_pinned=<optimized out>, buf_id=<optimized out>) at bufmgr.c:2111
#13 BgBufferSync () at bufmgr.c:2044
#14 0x00000000008e0ed5 in BackgroundWriterMain () at bgwriter.c:344
#15 0x00000000005f5ff5 in AuxiliaryProcessMain (argc=-4, argv=0x7fff92e2d6a0) at bootstrap.c:483
#16 0x00000000008ee4e4 in StartChildProcess (type=<optimized out>) at postmaster.c:7992
#17 0x00000000008f459a in CommenceNormalOperations () at postmaster.c:4543
#18 0x00000000008f66ca in do_reaper () at postmaster.c:4980
#19 0x00000000008f9598 in ServerLoop () at postmaster.c:2437
#20 0x00000000008faf10 in PostmasterMain (argc=15, argv=0x31bd4a0) at postmaster.c:1540
#21 0x00000000007fcf7f in main (argc=15, argv=0x31bd430) at main.c:206

The following error messages may be noted in the gpstop logs:

 

20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-There are 157 connections to the database
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast'
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Master host=mdw.randolph.ms.com
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Detected 157 connections to database
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Switching to WAIT mode
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Will wait for shutdown to complete, this may take some time if
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-there are a large number of active complex transactions, please wait...
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Master segment instance directory=/var/gpdb/nypgp014/datamaster/gpseg-1
20171030:13:51:33:721637 gpstop:mdw:gpadmin-[INFO]:-Failed to shutdown master with pg_ctl.
20171030:13:51:33:721637 gpstop:mdw:gpadmin-[INFO]:-Sending SIGQUIT signal... <<<<
20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Terminating processes for segment /var/gpdb/nypgp014/datamaster/gpseg-1
20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Stopping master standby host idb102.randolph.ms.com mode=fast
20171030:13:51:45:721637 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown standby process on idb102.randolph.ms.com
20171030:13:51:45:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel primary segment instance shutdown, please wait...

 

 

 

Cause 

When a gpstop -M fast is executed, all remaining queries on the master must complete before the WAIT timer expires. If these queries are not completed, they will be forcefully shut down. 

This could mean that queries are cancelled before they finish replicating, or before the replication is logged. 

The writer process will issue a SIGABORT, which will cause a core to be generated, if it does not have confirmation that mirror replication is complete. 

Resolution

If you believe you have encountered this issue, please open a ticket with Pivotal Support. 

A defect has been opened with Pivotal Engineering to address this issue.

Comments

Powered by Zendesk