Pivotal Knowledge Base

Follow

GPRECOVERSEG Error: "ValueError: Invalid Literal for int() with Base 10"

Environment

 Product  Version
 Pivotal Greenplum  4.3.x
 OS  RHEL 6.x

Symptom

gprecoverseg with verbose fails with the error message below and no specific/evident error seen on the master, primary and mirror logs:

Also, when the database is restarted, gpstop throws a WARNING:

[WARNING]:-Unable to clean shared memory ('NoneType' object has no attribute 'rc')

Cause

As postmaster.pid file exists on a problematic segment, gprecoverseg is unable to complete the recovery operation. All the postgres related processes including the postmaster.pid file should be stopped for the respective segment before they go into the state of recovery.  

Resolution

1. Find the Unclean SharedMemory Segment: How To Check for Unclean SharedMemory Segments.

2. Locate file space directories from pg_filespace_entry for the failed segments:

gpstate:bdtcstr21n1:gpadmin-[INFO]:- Segment Port Config status Status
gpstate:bdtcstr21n1:gpadmin-[INFO]:- bdtcstr21n14 50003 Up Unknown -- unable to load segment status
gpstate:bdtcstr21n1:gpadmin-[INFO]:- bdtcstr21n14 50005 Up Unknown -- unable to load segment status
gpstate:bdtcstr21n1:gpadmin-[INFO]:- bdtcstr21n14 40001 Up Unknown -- unable to load segment status
gpstate:bdtcstr21n1:gpadmin-[INFO]:- bdtcstr21n14 40003 Up Unknown -- unable to load segment status

3. SSH to the corresponding segment server. Check if the postmaster.pid file still exists on the failed segments that are marked down in gp_segment_configuration.

[gpadmin@bdtcstr20n1 base]$ ssh bdtcstr21n14
[gpadmin@bdtcstr21n14 data4]$ ls -ltrh gpseg1*/postmas*
 -rw------- 1 gpadmin u_bigdat 22 Nov 10 09:38 gpseg135/postmaster.pid
 -rw------- 1 gpadmin u_bigdat 157 Nov 10 09:38 gpseg135/postmaster.opts
 -rw------- 1 gpadmin u_bigdat 22 Nov 10 09:38 gpseg143/postmaster.pid
 -rw------- 1 gpadmin u_bigdat 22 Nov 10 09:38 gpseg141/postmaster.pid
 -rw------- 1 gpadmin u_bigdat 157 Nov 10 09:38 gpseg143/postmaster.opts
 -rw------- 1 gpadmin u_bigdat 157 Nov 10 09:38 gpseg141/postmaster.opts

4. Remove postmaster.pid files that still exist on failed segments. 

Important Note: Do not remove postmaster.pid file from valid segments that are up and running and do not have issues with.

 

 

Comments

Powered by Zendesk