Pivotal Knowledge Base

Follow

"gpstate -f" shows "error received sending data to standby master: server closed the connection unexpectedly"

Environment

Product Version
Pivotal HDB (Pivotal Hadoop Database) 1.x, 2.0.0
OS RHEL 6.x

Symptom

When attempting to get standby master status with the "gpstate -f" command, it shows an error message of "error received sending data to standby master: server closed the connection unexpectedly".

Error Message:

gpadmin@hma01:/home/gpadmin>gpstate -f
20160816:10:40:54:621505 gpstate:hma01:gpadmin-[INFO]:-Starting gpstate with args: -f
20160816:10:40:54:621505 gpstate:hma01:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20160816:10:40:55:621505 gpstate:hma01:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.3.1.0 build 15874) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jul 30 2015 07:31:30'
20160816:10:40:56:621505 gpstate:hma01:gpadmin-[INFO]:-Obtaining Segment details from master...
20160816:10:41:02:621505 gpstate:hma01:gpadmin-[INFO]:-Standby master details
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:-----------------------
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:- Standby address = hma02
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:- Standby data directory = /data/hawqnew/master/gpsegnew-1
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:- Standby port = 5432
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:- Standby PID = 129206
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:- Standby status = Standby host passive
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--------------------------------------------------------------
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--gp_master_mirroring table
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--------------------------------------------------------------
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--Summary state: Not Synchronized
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--Detail state: Connection error
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--Log time: 2016-08-16 12:30:51+08
20160816:10:41:03:621505 gpstate:hma01:gpadmin-[INFO]:--Error message: error received sending data to standby master: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request. 

Cause

The error message indicates that the master instance fails to send data to the standby master side. 

Further checking on the standby master side shows more details about the problem:

1. no gpsyncagent process running 

gpadmin@hma02:/home/gpadmin>ps -ef|grep postgres
gpadmin 129209 129206 0 Aug15 ? 00:00:07 postgres: port 5432, logger process
gpadmin 242002 129206 0 Aug15 ? 00:00:01 postgres: port 5432, WAL Redo Server process

2. walredoserver process had once crashed with SIGSEGV and was reset

2016-08-15 22:30:48.226838 CST,,,p129380,th0,,,,0,con1,,seg-1,,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 
0x36d320f710 libpthread.so.0 <symbol not found> + 0xd320f710
2 0x4fcd75 postgres _bt_insert_parent + 0x745
3 0x50cbc2 postgres btree_xlog_cleanup + 0x1a2
4 0x539c94 postgres XLogStandbyRecoverRange + 0x7d4
5 0xc40645 postgres cdb_perform_redo + 0x35
6 0x8e784b postgres <symbol not found> + 0x8e784b
7 0x8e163c postgres <symbol not found> + 0x8e163c
8 0x8e1e22 postgres ServiceMain + 0x352
9 0x8e79a9 postgres walredoserver_start + 0x39
"
2016-08-15 22:30:50.291304 CST,,,p129206,th778668064,,,,0,,,seg-1,,,,,"LOG","00000","walredoserver process (PID 129380) was terminated by signal 11: Segmentation fault"
,,,,,,,0,,"postmaster.c",5659,
2016-08-15 22:30:50.291469 CST,,,p129206,th778668064,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 129380) was terminated by signal 11: Segmentation fault",,,,,,,
0,,"postmaster.c",5659,

It's also reported at almost the same time on the master side that master mirroring synchronization was lost:

2016-08-15 22:30:51.073894 CST,,,p159329,th-1487857632,,,,0,con2,,seg-1,,,,,"WARNING","58M01","error received sending data to standby master: server closed the connecti
on unexpectedly","
This probably means the server terminated abnormally
before or while processing the request.
The Greenplum Database is no longer highly available",,,,,,0,,"cdblink.c",482,
2016-08-15 22:30:51.074017 CST,,,p159329,th-1487857632,,,,0,con2,,seg-1,,,,,"WARNING","58M01","Master mirroring synchronization lost","Connection to the standby master was lost attempting to send new transaction log
The Greenplum Database is no longer highly available.",,,,,,0,,"cdbfts.c",761,

Resolution

1. According to the stack trace shown above, it should be a known issue which will be fixed in HDB 2.0.1 onwards.

2. It's suggested to upgrade to HDB release with the fix when it's ready. Prior to upgrade, if the issue keeps happening after master mirroring is resynchronized with "gpinitstandby -n", then there should be problem with xlogs at standby master side. So it's suggested to remove the standby master and initialize it again as illustrated below.

Note: The database system will be restarted during the process of standby master reinitialization. So it is better to carry out this operation when the system is IDLE.

gpadmin@hma01:/home/gpadmin> gpinitstandby -r -a
gpadmin@hma01:/home/gpadmin> gpinitstandby -s <standby_master_hostname> -M fast -a

Additional Information

 

 

 

Comments

Powered by Zendesk