Pivotal Knowledge Base

Follow

gpcrondump appears to be stuck on backing up one schema

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x

Symptom

A gpcrondump backup of a Greenplum database may appear to be making no progress as the same schema is processed multiple times. In a normal backup, each schema should only be added once, but as can been seen below, the schema "admin" is "Added" 566 times over three hours: 

[gpadmin@MASTER]$ grep -c "Adding schema name admin" /home/admin/gpAdminLog/gpcrondump_20160518.log
566
[gpadmin@MASTER]$
[gpadmin@MASTER$ grep schema /home/admin/gpAdminLog/gpcrondump_20160518.log | head -3
20160518:20:00:19:018174 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
20160518:20:00:51:023448 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
20160518:20:01:25:025137 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin

[gpadmin@MASTER]$ grep schema /home/admin/gpAdminLog/gpcrondump_20160518.log | tail -3 20160518:23:59:05:019963 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
20160518:23:59:28:020960 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
20160518:23:59:52:021801 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
[gpadmin@gss-scripts crondumpMon_Wend]$

Error Message:

The gpcrondump log in /home/gpadmin/gpAdminLogs shows the following:

20160518:20:00:19:018174 gpcrondump:MASTER:gpadmin-[INFO]:-Adding --no-expand-children
20160518:20:00:19:018174 gpcrondump:MASTER:gpadmin-[INFO]:-Adding schema name admin
20160518:20:00:19:018174 gpcrondump:MASTER:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=ddb_gpcrondump/20160518 --gp-r=/data/master/gp
seg-1/ddb_gpcrondump/20160518 --gp-s=p --gp-k=20160518200006 --no-lock --ddboost --no-expand-children -n "\"admin\"" csmgp
20160518:20:00:19:018174 gpcrondump:MASTER:gpadmin-[INFO]:-Starting Dump process
20160518:20:00:32:018174 gpcrondump:MASTER:gpadmin-[WARNING]:-Dump process returned exit code 1
20160518:20:00:32:018174 gpcrondump:MASTER:gpadmin-[DEBUG]:-Ending DumpDatabase

For each of the above warnings, a log file is created in /data/master/gpseg-1/ddb_gpcrondump/<DATE>, for example:/data/master/gpseg-1/ddb_gpcrondump/20150518/gp_dump_status_1_1_20160518200006. This file will contain further information about the dump failure. 

[ gpadmin@csmmas01 ] /data/master/gpseg-1/ddb_gpcrondump/20160518 > less gp_dump_status_1_1_20160518200006
20160518:20:00:19|ddboost-[DEBUG]:-Libraries were loaded successfully
20160518:20:00:19|ddboost-[INFO]:-opening LB on /home/gpadmin/DDBOOST_CONFIG
20160518:20:00:19|gp_dump_agent-[INFO]:-Starting monitor thread
20160518:20:00:19|gp_dump_agent-[INFO]:-Dumping database "csmgp"...
20160518:20:00:19|gp_dump_agent-[INFO]:-reading schemas
20160518:20:00:19|gp_dump_agent-[INFO]:-reading user-defined functions
20160518:20:00:19|gp_dump_agent-[INFO]:-reading user-defined types
20160518:20:00:19|gp_dump_agent-[ERROR]:-schema with OID 335086059 does not exist
20160518:20:00:19|gp_dump_agent-[ERROR]:-*** aborted because of error:
20160518:20:00:20|gp_dump_agent-[INFO]:-TASK_SET_SERIALIZABLE
20160518:20:00:20|gp_dump_agent-[ERROR]:-*** aborted because of error:
20160518:20:00:20|ddboost-[DEBUG]:-Libraries were loaded successfully
20160518:20:00:20|ddboost-[INFO]:-opening LB on /home/gpadmin/DDBOOST_CONFIG
20160518:20:00:20|gp_dump_agent-[INFO]:-Starting monitor thread
20160518:20:00:21|gp_dump_agent-[INFO]:-Dumping database "csmgp"...
20160518:20:00:21|gp_dump_agent-[INFO]:-reading schemas
20160518:20:00:21|gp_dump_agent-[INFO]:-reading user-defined functions
20160518:20:00:21|gp_dump_agent-[INFO]:-reading user-defined types
20160518:20:00:21|gp_dump_agent-[ERROR]:-schema with OID 335086059 does not exist
20160518:20:00:21|gp_dump_agent-[ERROR]:-*** aborted because of error:
20160518:20:00:21|gp_dump_agent-[INFO]:-TASK_SET_SERIALIZABLE
20160518:20:00:21|gp_dump_agent-[ERROR]:-*** aborted because of error:

Cause 

Catalog issues in the Greenplum database were causing this issue. In the case of the above example, namespace with OID "335086059" did not exist in the catalog:

csmgp=# select oid,* from pg_namespace where oid=335086059;
oid | nspname | nspowner | nspacl
-----+---------+----------+--------
(0 rows)

Resolution

Fix the catalogue issues using the catalog troubleshooting steps.

In the case of the above example, the following steps were taken:

1. Confirm that the OID does not exist in the catalogue:

csmgp=# select oid,* from pg_namespace where oid=335086059;
oid | nspname | nspowner | nspacl
-----+---------+----------+--------
(0 rows)

2. Determine which tables (if any) are in the namespace:

csmgp=# select * from pg_class where relnamespace=335086059;
relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relao
segrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspk
ey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
------------------------------+--------------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+------
---------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+---------
---+-------------+----------------+--------------+--------+------------
tmp_roaming_alert_profile | 335086059 | 335086322 | 17146 | 0 | 335086321 | 0 | 0 | 0 | 335086323 | 0 |
0 | 0 | f | f | r | h | 15 | 0 | 0 | 0 | 0 | 0 | f | f
| f | f | 166927740 | |
tmp_roaming_alert_clean_data | 335086059 | 335086350 | 17146 | 0 | 335086349 | 0 | 0 | 0 | 335086351 | 0 |
0 | 0 | f | f | r | h | 16 | 0 | 0 | 0 | 0 | 0 | f | f
| f | f | 166927776 | |
(2 rows) 

3. Confirm that the OID and tables are not present in any of the segments:

csmgp=# select gp_segment_id,* from gp_dist_random('pg_class') where relnamespace=335086059;
gp_segment_id | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegreli
d | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | re
lhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------
--+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+---
----------+----------------+--------------+--------+------------
(0 rows)

csmgp=#

csmgp=# select gp_segment_id,* from gp_dist_random('pg_class') where relname in ('tmp_roaming_alert_profile','tmp_roaming_alert_clean_data');
gp_segment_id | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegreli
d | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | re
lhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------
--+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+---
----------+----------------+--------------+--------+------------
(0 rows)

csmgp=#

4. Confirm that the tables are empty or do not exist:

csmgp=# select count(*) from tmp_roaming_alert_profile;
ERROR: relation "tmp_roaming_alert_profile" does not exist
LINE 1: select count(*) from tmp_roaming_alert_profile;
^
csmgp=# select count (*) from tmp_roaming_alert_clean_data;
ERROR: relation "tmp_roaming_alert_clean_data" does not exist
LINE 1: select count (*) from tmp_roaming_alert_clean_data;

5. Find the OID of the public namespace:

csmgp=# select oid,* from pg_namespace where nspname='public';
oid | nspname | nspowner | nspacl
-----------+---------------------+----------+--------------------
2200 | public | 10 | {gpadmin=UC/gpadmin,=UC/gpadmin}

6. Move the two tables to the public namespace (OID 2200, in the case of this particular system):

csmgp=# set allow_system_table_mods=dml
csmgp-# ;
SET
csmgp=# begin
csmgp-# ;
BEGIN
csmgp=# update pg_class set relnamespace=2200 where relnamespace=335086059;
UPDATE 2
csmgp=# select * from pg_class where relnamespace=335086059;
relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxi
d | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relh
assubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+-------------
--+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+-----
-----------+--------------+--------+------------
(0 rows)

csmgp=# commit;
COMMIT 

7. Drop the two tables:

DROP TABLE IF EXISTS tmp_roaming_alert_clean_data;
DROP TABLE IF EXISTS tmp_roaming_alert_profile;

8. Restart the backup to confirm the issue is resolved. 

Comments

Powered by Zendesk