Pivotal Knowledge Base

Follow

How to cancel a backup gpcrondump job

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x
Others  

Purpose

This article explains step by step approach to cancel a gpcrondump.

Cause

Killing a gpcrondump on the master server (mdw) does not terminate the backup completely and leaves orphaned backup agents on the segment servers still running.

Procedure

Scenario 1. 

We are killing this backup because we think the backup is hanging and do not have a clue if it is progressing or on which stage of the backup it is? If yes then please refer to below KB article first Troubleshooting long running gpcrondump jobs

Scenario 2.

If we have already troubleshot the backup progress using the above article and we would still want to go ahead and cancel the backup then do the below steps:

1. Get the PID of the gpcrondump process running on the Master

ps -ef | grep gpcrondump | grep -v grep

[gpadmin@mdw gpadmin]$ ps -ef | grep gpcrondump | grep -v grep
gpadmin 25656 11385 7 09:12 pts/3 00:00:00 python /usr/local/GP-4.3.8.1/bin/gpcrondump -x gpadmin

ps -ef | grep gpcrondump | grep -v grep | awk '{print $2}'

[gpadmin@mdw gpadmin]$ ps -ef | grep gpcrondump | grep -v grep | awk '{print $2}'
25656

2. Once verified that the above PID is correct, move ahead with the kill (Will see "Terminated" if tailing the gpcrondump.log or viewing the output interactively)

ps -ef | grep gpcrondump | grep -v grep | awk '{print $2}' | xargs kill

[gpadmin@mdw gpadmin]$ ps -ef | grep gpcrondump | grep -v grep | awk '{print $2}' | xargs kill

3. This would still leave the agent process running on both the master and the segments which need to be cleaned up

Get the current primary hostnames from gp_segment_configuration

ssh to master server
su - gpadmin
cd /home/gpadmin
psql
\o gp-primary
\t
select distinct hostname from gp_segment_configuration where role = 'p';
\q

[gpadmin@mdw gpadmin]$ cat /home/gpadmin/gp-primary
sdw2
sdw1
mdw

Now using gpssh, run the blow steps to clean out the agent dump processes running on the master and segment:

gpssh -f /home/gpadmin/gp-primary

[gpadmin@mdw gpadmin]$ gpssh -f /home/gpadmin/gp-primary
=>

 4. Get the PID of the gp_dump_agent process running on the Master and segment hosts

ps -ef | grep gp_dump_agent | grep -v grep

=> ps -ef | grep gp_dump_agent | grep -v grep
[ mdw] gpadmin 11529 11528 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_1_1_Y2hhbmdlbWU= --gp-d /data/gpdb/master/gpseg-1/db_dumps/20160923 -p 54320 -U gpadmin --pre-and-post-data-schema-only "gpadmin" 2> /data/gpdb/master/gpseg-1/db_dumps/20160923/gp_dump_status_1_1_20160923093425 | /bin/gzip -1 > /data/gpdb/master/gpseg-1/db_dumps/20160923/gp_dump_1_1_20160923093425.gz
[ mdw] gpadmin 11530 11529 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_1_1_Y2hhbmdlbWU= --gp-d /data/gpdb/master/gpseg-1/db_dumps/20160923 -p 54320 -U gpadmin --pre-and-post-data-schema-only gpadmin
[sdw1] gpadmin 28884 28880 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_3_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg1/db_dumps/20160923 -p 46001 -U gpadmin -a "gpadmin" 2> /data1/gpdb/gpseg1/db_dumps/20160923/gp_dump_status_0_3_20160923093425 | /bin/gzip -1 > /data1/gpdb/gpseg1/db_dumps/20160923/gp_dump_0_3_20160923093425.gz
[sdw1] gpadmin 28885 28884 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_3_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg1/db_dumps/20160923 -p 46001 -U gpadmin -a gpadmin
[sdw1] gpadmin 28887 28882 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_5_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg3/db_dumps/20160923 -p 46003 -U gpadmin -a "gpadmin" 2> /data2/gpdb/gpseg3/db_dumps/20160923/gp_dump_status_0_5_20160923093425 | /bin/gzip -1 > /data2/gpdb/gpseg3/db_dumps/20160923/gp_dump_0_5_20160923093425.gz
[sdw1] gpadmin 28889 28881 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_2_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg0/db_dumps/20160923 -p 46000 -U gpadmin -a "gpadmin" 2> /data1/gpdb/gpseg0/db_dumps/20160923/gp_dump_status_0_2_20160923093425 | /bin/gzip -1 > /data1/gpdb/gpseg0/db_dumps/20160923/gp_dump_0_2_20160923093425.gz
[sdw1] gpadmin 28890 28883 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_4_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg2/db_dumps/20160923 -p 46002 -U gpadmin -a "gpadmin" 2> /data2/gpdb/gpseg2/db_dumps/20160923/gp_dump_status_0_4_20160923093425 | /bin/gzip -1 > /data2/gpdb/gpseg2/db_dumps/20160923/gp_dump_0_4_20160923093425.gz
[sdw1] gpadmin 28891 28887 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_5_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg3/db_dumps/20160923 -p 46003 -U gpadmin -a gpadmin
[sdw1] gpadmin 28894 28889 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_2_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg0/db_dumps/20160923 -p 46000 -U gpadmin -a gpadmin
[sdw1] gpadmin 28896 28890 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_4_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg2/db_dumps/20160923 -p 46002 -U gpadmin -a gpadmin
[sdw2] gpadmin 2773 2769 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_9_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg7/db_dumps/20160923 -p 46003 -U gpadmin -a "gpadmin" 2> /data2/gpdb/gpseg7/db_dumps/20160923/gp_dump_status_0_9_20160923093425 | /bin/gzip -1 > /data2/gpdb/gpseg7/db_dumps/20160923/gp_dump_0_9_20160923093425.gz
[sdw2] gpadmin 2774 2773 1 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_9_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg7/db_dumps/20160923 -p 46003 -U gpadmin -a gpadmin
[sdw2] gpadmin 2776 2770 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_6_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg4/db_dumps/20160923 -p 46000 -U gpadmin -a "gpadmin" 2> /data1/gpdb/gpseg4/db_dumps/20160923/gp_dump_status_0_6_20160923093425 | /bin/gzip -1 > /data1/gpdb/gpseg4/db_dumps/20160923/gp_dump_0_6_20160923093425.gz
[sdw2] gpadmin 2778 2771 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_7_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg5/db_dumps/20160923 -p 46001 -U gpadmin -a "gpadmin" 2> /data1/gpdb/gpseg5/db_dumps/20160923/gp_dump_status_0_7_20160923093425 | /bin/gzip -1 > /data1/gpdb/gpseg5/db_dumps/20160923/gp_dump_0_7_20160923093425.gz
[sdw2] gpadmin 2780 2776 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_6_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg4/db_dumps/20160923 -p 46000 -U gpadmin -a gpadmin
[sdw2] gpadmin 2784 2778 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_7_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg5/db_dumps/20160923 -p 46001 -U gpadmin -a gpadmin
[sdw2] gpadmin 2787 2772 0 09:34 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_8_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg6/db_dumps/20160923 -p 46002 -U gpadmin -a "gpadmin" 2> /data2/gpdb/gpseg6/db_dumps/20160923/gp_dump_status_0_8_20160923093425 | /bin/gzip -1 > /data2/gpdb/gpseg6/db_dumps/20160923/gp_dump_0_8_20160923093425.gz
[sdw2] gpadmin 2792 2787 0 09:34 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_dump_agent --gp-k 20160923093425_0_8_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg6/db_dumps/20160923 -p 46002 -U gpadmin -a gpadmin
=>

ps -ef | grep gp_dump_agent | grep -v grep | awk '{print $2}'
[ mdw] 11529
[ mdw] 11530
[sdw1] 28884
[sdw1] 28885
[sdw1] 28887
[sdw1] 28889
[sdw1] 28890
[sdw1] 28891
[sdw1] 28894
[sdw1] 28896
[sdw2] 2773
[sdw2] 2774
[sdw2] 2776
[sdw2] 2778
[sdw2] 2780
[sdw2] 2784
[sdw2] 2787
[sdw2] 2792

5. Once verified that the above PID's are correct, move ahead with the kill

ps -ef | grep gp_dump_agent | grep -v grep | awk '{print $2}' | xargs kill

=> ps -ef | grep gp_dump_agent | grep -v grep | awk '{print $2}' | xargs kill
[ mdw]
[sdw1]
[sdw2]

6. Give it a minute and run the ps command again to make sure we do not get back anything

=> ps -ef | grep gp_dump_agent | grep -v grep | awk '{print $2}'
[ mdw]
[sdw1]
[sdw2]
=>

7. Make sure the gpcrondump status files on both master and segments says that the backup was terminated, below message will be seen:

20160919:13:18:16|gp_dump_agent-[ERROR]:-Error message from server: ERROR:  canceling statement due to user request
20160919:13:18:16|gp_dump_agent-[ERROR]:-The command was: select distinct(oid), typstorage from pg_type where oid in (select distinct atttypid from pg_attribute)
20160919:13:18:16|gp_dump_agent-[ERROR]:-*** aborted because of error: ERROR:  canceling statement due to user request
20160919:13:18:18|gp_dump_agent-[ERROR]:-*** aborted because of error: ERROR:  canceling statement due to user request


Related Articles

How to cancel a restore job

Known Issues

If during a gpcrondump the GPDB instance crashes or is restarted the gp_dump_agents would still be running. We have to remember to clean them using the above steps

Comments

  • Avatar
    Brendan Stephens

    Recommended Internal Article

Powered by Zendesk