Pivotal Knowledge Base

Follow

Greenplum postmaster resets with error "PID terminated by signal 6: Aborted"

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x

Symptom

  • Greenplum master server will reset the postmaster several times in succession
  • Core files will be generated on the master server when the error occurs
  • No PANIC will be generated on the error condition

Error Message:

Note: These log entries will be found in the master server log:

2016-07-28 06:31:15.932941 EDT,,,p516025,th734934816,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 520911) was terminated by signal 6: Aborted",,,,,,,0
,,"postmaster.c",5876,
2016-07-28 06:31:15.932981 EDT,,,p516025,th734934816,,,,0,,,seg-1,,,,,"LOG","00000","terminating any other active server processes",,,,,,,0,,"postmaster.c",5
563,
2016-07-28 06:31:15.980142 EDT,,,p516025,th734934816,,,,0,,,seg-1,,,,,"LOG","00000","sweeper process (PID 516100) exited with exit code 2",,,,,,,0,,"postmast
er.c",5854,
2016-07-28 06:31:15.980203 EDT,,,p516101,th734934816,,,,0,con4,,seg-1,,,,,"FATAL","57P01","terminating connection due to administrator command",,,,,,,0,,"pos
tgres.c",3562,
2016-07-28 06:31:15.986059 EDT,"502437963","get_data",p526112,th734934816,"3.48.33.20","27767",2016-07-28 06:31:15 EDT,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode",,,,,,,0,,"postmaster.c",2967,

Note: Additional error messages related to glibc will also be present in the master server log:

2016-07-28 06:55:40.186393 EDT,,,p516034,th734934816,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
*** glibc detected *** postgres: port  5432, 502433194 get_data 3.48.33.20(28018) con293124 3.48.33.20(28018) cmd10 con293124 3.48.33.20(28018) cmd10 : corrupted double-linked list: 0x0000000002224ea0 ***",,,,,,,,"SysLoggerMain","syslogger.c",550,

Cause 

This issue is caused by software defect MPP-26507 which is first fixed in GPDB 4.3.9.1.

A memory overflow problem could cause a corrupted double-linked list in glibc logging, and ultimately result in the master segment crashing with a signal 6: Aborted error.

RCA 

This issue will present itself due to a memory overflow in the glibc logging.  By reviewing the core file it can be seen that resultSets variable in the function cdbdisp_returnResults is returning 0x2132a20 (r12):

(gdb) f 9
#9  0x0000000000bf31a8 in cdbdisp_returnResults (dtxProtocolCommand=<value optimized out>, flags=<value optimized out>, dtxProtocolCommandLoggingStr=<value optimized out>, gid=<value optimized out>,
    gxid=<value optimized out>, errmsgbuf=0x40, numresults=0x7fff09ed9f4c, badGangs=0x7fff09ed9fcf "", direct=0x7fff09ed9fc0, argument=0x0, argumentLength=0) at cdbdisp.c:988
988	in cdbdisp.c
(gdb) info reg
rax            0x0	0
rbx            0x20428d0	33827024
rcx            0xffffffffffffffff	-1
rdx            0x6	6
rsi            0x5e7a4	386980
rdi            0x5e7a4	386980
rbp            0x7fff09ed9ed0	0x7fff09ed9ed0
rsp            0x7fff09ed9e20	0x7fff09ed9e20
r8             0x0	0
r9             0x7f902bd04188	140257187086728
r10            0x8	8
r11            0x202	514
r12            0x2132a20	34810400
r13            0x20	32
r14            0x40	64
r15            0x300	768
rip            0xbf31a8	0xbf31a8 <cdbdisp_dispatchDtxProtocolCommand+696>
eflags         0x202	[ IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0


However the malloc size is this variable is is 512 bytes(0x2132a18) which is 512 bytes smaller than what the function above is returning. This is where the memory overflow occurs and causes symptoms noted in this article:

0x2132a10:	0x00000110	0x00000000	0x00000211	0x00000000
0x2132a20:	0x100011e0	0x00007f90	0x10158b20	0x00007f90
0x2132a30:	0x101211f0	0x00007f90	0x100010d0	0x00007f90
0x2132a40:	0x1014f120	0x00007f90	0x101369f0	0x00007f90
0x2132a50:	0x10136db0	0x00007f90	0x101564a0	0x00007f90
0x2132a60:	0x10014470	0x00007f90	0x101366c0	0x00007f90
0x2132a70:	0x10163860	0x00007f90	0x1006b0a0	0x00007f90
0x2132a80:	0x10136b90	0x00007f90	0x1015b1a0	0x00007f90
0x2132a90:	0x100a4dc0	0x00007f90	0x10136ca0	0x00007f90
0x2132aa0:	0x10170c20	0x00007f90	0x10057eb0	0x00007f90
0x2132ab0:	0x100016c0	0x00007f90	0x10104aa0	0x00007f90
0x2132ac0:	0x1016c940	0x00007f90	0x10001520	0x00007f90
0x2132ad0:	0x101517a0	0x00007f90	0x100fb180	0x00007f90
0x2132ae0:	0x100012f0	0x00007f90	0x1015c4e0	0x00007f90
0x2132af0:	0x100277b0	0x00007f90	0x10134d40	0x00007f90
0x2132b00:	0x101611e0	0x00007f90	0x1010e400	0x00007f90
0x2132b10:	0x10134e50	0x00007f90	0x1016f8e0	0x00007f90
0x2132b20:	0x100ae710	0x00007f90	0x10134f60	0x00007f90
0x2132b30:	0x10164ba0	0x00007f90	0x1001de10	0x00007f90
0x2132b40:	0x1012aec0	0x00007f90	0x10162520	0x00007f90
0x2132b50:	0x1004e590	0x00007f90	0x1014cca0	0x00007f90
0x2132b60:	0x10152ae0	0x00007f90	0x1003b350	0x00007f90
0x2132b70:	0x101367d0	0x00007f90	0x10155160	0x00007f90
0x2132b80:	0x10117850	0x00007f90	0x10135070	0x00007f90
0x2132b90:	0x1016abe0	0x00007f90	0x10031a40	0x00007f90
(gdb)
0x2132ba0:	0x1014dee0	0x00007f90	0x1016d260	0x00007f90
0x2132bb0:	0x100916e0	0x00007f90	0x101368e0	0x00007f90
0x2132bc0:	0x10167220	0x00007f90	0x100de5a0	0x00007f90
0x2132bd0:	0x1014a820	0x00007f90	0x10159e60	0x00007f90
0x2132be0:	0x10009e40	0x00007f90	0x1012aca0	0x00007f90
0x2132bf0:	0x10087d40	0x00007f90	0x1016efc0	0x00007f90
0x2132c00:	0x1012b1f0	0x00007f90	0x10171f60	0x00007f90
0x2132c10:	0x10031150	0x00007f90	0x10148cc0	0x00007f90
0x2132c20:	0x10168560	0x00007f90	0x1009b080	0x00007f90
0x2132c30:	0x101495e0	0x00007f90	0x101745e0	0x00007f90
0x2132c40:	0x100d4c80	0x00007f90	0x10149f00	0x00007f90
0x2132c50:	0x101732a0	0x00007f90	0x100b8070	0x00007f90
0x2132c60:	0x1014e800	0x00007f90	0x10165ee0	0x00007f90
0x2132c70:	0x100617c0	0x00007f90	0x1014b140	0x00007f90
0x2132c80:	0x1015d820	0x00007f90	0x1015f580	0x00007f90
0x2132c90:	0x1014ba60	0x00007f90	0x10044c70	0x00007f90
0x2132ca0:	0x100e7f00	0x00007f90	0x1014d5c0	0x00007f90
0x2132cb0:	0x10150460	0x00007f90	0x1000a9d0	0x00007f90
0x2132cc0:	0x1012afd0	0x00007f90	0x101698a0	0x00007f90
0x2132cd0:	0x100c19d0	0x00007f90	0x1012b0e0	0x00007f90
0x2132ce0:	0x1015fea0	0x00007f90	0x100f1820	0x00007f90
0x2132cf0:	0x1014c380	0x00007f90	0x10153e20	0x00007f90
0x2132d00:	0x10074a00	0x00007f90	0x1012adb0	0x00007f90
0x2132d10:	0x1007e3a0	0x00007f90	0x10158200	0x00007f90
0x2132d20:	0x6f633a20	0x6e616d6d	0x02132df0	0x00000000

Resolution

This issue is caused by software defect MPP-26507 which is first fixed in GPDB 4.3.9.1.  Upgrading to this version or higher will remedy this issue.

 

Comments

Powered by Zendesk