Pivotal Knowledge Base

Follow

gpdbrestore fails with "Type 2 could not be be found" error message

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x

Symptom

After running for a couple of hours, gpdbrestore failed on the segments. The error message in the gpdbrestore logs looked somewhat like the one shown below:

20160402:11:30:54|gp_restore-[ERROR]:-Error executing query SELECT * FROM gp_read_backup_file('db_dumps/20160402', '20160402112535', 2) : ERROR:  Backup File /data1/primary/gpseg0/db_dumps/20160402/gp_restore_status_0_2_20160402112535 Type 2 could not be be found
20160402:11:30:54|gp_restore-[ERROR]:-restore failed for source dbid 2, target dbid 2 on host sdw1

Upon tracing, it was found that the segment that reported the error encountered an Out of Memory (OOM) situation. Refer to the first occurrence of "Type 2" error in the gpdbrestore log under ~/gpAdminLogs directory and check the status file for that segment. In this case, it was seg0 on sdw1. The restore status file would be in the segment directory.

[gpadmin@sdw1 ~]$ cat /data1/primary/gpseg0/gp_restore_status_0_2_20160402112535
20160402:11:29:14|gp_restore_agent-[INFO]:-Before fork of gp_restore_agent
20160402:11:29:14|gp_restore_agent-[INFO]:-Command Line: /bin/cat /data1/primary/gpseg0/db_dumps/20160402/gp_dump_0_2_20160402112535.gz | /bin/gunzip -c  | /usr/local/GP-4.3.5.2/bin/psql -h sdw1 -p 37000 -U gpadmin -d spring -a 
20160402:11:29:14|gp_restore_agent-[INFO]:-Starting monitor thread
ERROR:  column "test" of relation "testbackup1" does not exist
out of memory

gunzip: stdout: Broken pipe
/bin/cat: write error: Broken pipe
20160402:11:30:54|gp_restore_agent-[ERROR]:-psql finished abnormally with return code 1.
20160402:11:30:54|gp_restore_agent-[ERROR]:-Finished with errors

Cause 

In the Greenplum Database, backups are taken using COPY commands with the data to be loaded again using standard input (STDIN). With STDIN, the way COPY behaves is, if the table does not exist OR the columns are not exactly the same as the metadata, then the whole data will be sent to the memory and OOM situation can happen if the data is huge. These issues were possibly results of the following scenarios:

  • The schema for the partitioned tables has been changed at the source. This will allow the child tables to exist in the old schema and the backup will point towards that. This has been fixed in version 4.3.6.2 and above. Now the table will log alter statements with the child table to change the schema if backup has been taken using version 4.3.6.2 and above.
  • During the data transfer between the PROD and DR cluster, sometimes the table structure does not match. This could happen, for example, if the table has been altered anywhere and a new column has been added or dropped. In that case, the structure is different and data will be sent to the memory.  

Resolution

Upgrade to version 4.3.6.2 and above and take the backup again if the schemas are different for parent and child tables. You can crosscheck at the source database using the following query if the schemas of the parent and child are different:

SELECT schemaname , tablename , partitionschemaname , partitiontablename 
FROM pg_partitions 
WHERE schemaname <> partitionschemaname ;

Secondly, make sure the structure is same on the destination if metadata is already there. If not, a full restore is needed at the destination.

Refer number 25549 in the resolved issues section in the version 4.3.6.2 Release Notes

Additional Comments

For further information, please refer to the following: gp_restore Error "Error executing query SELECT * FROM gp_read_backup_file('db_dumps/xx', 'xx', 2) : ERROR: Backup File ... Type 2 could not be be found"

Comments

Powered by Zendesk