Pivotal Knowledge Base

Follow

Greenplum Error: Segments Go into Crash Recovery and the Dump Indicates "agg_hash_initial_pass"

Environment

 Product  Version
 Pivotal Greenplum (GPDB)  4.3.82
 OS  RHEL 6.x

Symptom

Segments go into crash recovery when running a query.

Error Message

Mater logs show:

[gpadmin@scripts[#42921] master_logs]$ less /data/logs/42921/master_logs/gpdb-2016-11-17_050834.csv

<...>

2016-11-17 05:16:23.324147 EST,"502435415","pgsdwh",p73290,th-1980680448,"3.69.21.88","65076",2016-11-17 05:04:04 EST,321565863,con494674,cmd3,seg-1,,dx8761489,x321565863,sx1,"LOG","00000","Dispatcher encountered connection error on seg51 slice5 sdw13.gphd.local:1028 pid=274184: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
",,,,,,,0,,,,
2016-11-17 05:16:23.326227 EST,"502440373","pgsdwh",p230642,th-1754728672,"3.47.69.233","61132",2016-11-17 02:00:40 EST,321394817,con483592,cmd398280,seg-1,,dx8621616,,sx199115,"LOG","00000","statement: RELEASE _EXEC_SVP_000000972DB21E40",,,,,,"RELEASE _EXEC_SVP_000000972DB21E40",0,,"postgres.c",1618,
2016-11-17 05:16:23.330415 EST,"502435415","pgsdwh",p73290,th-1980680448,"3.69.21.88","65076",2016-11-17 05:04:04 EST,321565863,con494674,cmd3,seg-1,,dx8761489,x321565863,sx1,"LOG","00000","Dispatcher encountered connection error on seg50 slice5 sdw13.gphd.local:1027 pid=274182: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
",,,,,,,0,,,,

Segment logs will show:

gpadmin@scripts[#42921] segment_logs]$ less /data/logs/42921/segment_logs/gpdb-2016-11-17_040000.csv

<...>

2016-11-17 05:16:29.062524 EST,,,p274184,th0,,,2016-11-17 05:16:04 EST,0,con494674,cmd3,seg51,slice5,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1    0x7fdb75980710 libpthread.so.0 <symbol not found> (??:0)
2    0xb0fdb0 postgres FunctionCall1 (fmgr.c:1263)
3    0x7bed02 postgres agg_hash_initial_pass (execHHashagg.c:165)
4    0x77deec postgres ExecAgg (nodeAgg.c:1022)
5    0x75d4a7 postgres ExecProcNode (execProcnode.c:979)
6    0x79e0ea postgres ExecMotion (nodeMotion.c:383)
7    0x75d276 postgres ExecProcNode (execProcnode.c:999)
8    0x7528cc postgres <symbol not found> (execMain.c:2553)
9    0x753086 postgres ExecutorRun (execMain.c:929)
10   0x9a46f1 postgres PortalRun (pquery.c:1259)
"  

The backtrace will look like this: 

(gdb) bt
#0  0x00007fdb759805db in raise () from /data/logs/42921/segment_logs/packcore-core.postgres.274184.1479377780.11.500.500/lib64/libpthread.so.0
#1  0x0000000000b03ce6 in StandardHandlerForSigillSigsegvSigbus_OnMainThread (processName=<value optimized out>, postgres_signal_arg=11) at elog.c:4479
#2  <signal handler called>
#3  pg_detoast_datum (datum=0x0) at fmgr.c:2026
#4  0x0000000000a55fc2 in hash_numeric (fcinfo=<value optimized out>) at numeric.c:1113
#5  0x0000000000b0fdb0 in FunctionCall1 (flinfo=0x345cc98, arg1=0) at fmgr.c:1260
#6  0x00000000007bed02 in calc_hash_value (aggstate=<value optimized out>) at execHHashagg.c:165
#7  agg_hash_initial_pass (aggstate=<value optimized out>) at execHHashagg.c:900
#8  0x000000000077deec in ExecAgg (node=0x33c7140) at nodeAgg.c:1020
#9  0x000000000075d4a7 in ExecProcNode (node=0x33c7140) at execProcnode.c:979
#10 0x000000000079e0ea in execMotionSender (node=0x33c6bd0) at nodeMotion.c:362
#11 ExecMotion (node=0x33c6bd0) at nodeMotion.c:329
#12 0x000000000075d276 in ExecProcNode (node=0x33c6bd0) at execProcnode.c:999
#13 0x00000000007528cc in ExecutePlan (estate=0x3247920, planstate=<value optimized out>, operation=<value optimized out>, numberTuples=<value optimized out>, direction=<value optimized out>, dest=<value optimized out>) at execMain.c:2553
#14 0x0000000000753086 in ExecutorRun (queryDesc=<value optimized out>, direction=<value optimized out>, count=<value optimized out>) at execMain.c:894
#15 0x00000000009a46f1 in PortalRunSelect (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1258
#16 PortalRun (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1109
#17 0x000000000099d4f3 in exec_mpp_query (argc=<value optimized out>, argv=<value optimized out>, dbname=0x304e030 "pgsdwh", username=<value optimized out>) at postgres.c:1358
#18 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x304e030 "pgsdwh", username=<value optimized out>) at postgres.c:4905
#19 0x00000000008fbcee in BackendRun () at postmaster.c:6963
#20 BackendStartup () at postmaster.c:6658
#21 ServerLoop () at postmaster.c:2464
#22 0x00000000008fea70 in PostmasterMain (argc=15, argv=0x2fc68d0) at postmaster.c:1540
#23 0x0000000000800aff in main (argc=15, argv=0x2fc6860) at main.c:206
(gdb)

Cause

This is a software defect reported by an internal tracking number.

Resolution

The issue is fixed, however, in December 2016 the fixed version wasn't released. 

Review the release notes to determine if the defect is fixed in the current release of Greenplum.  

Comments

Powered by Zendesk