Pivotal Knowledge Base

Follow

How to cancel restore gpdbrestore job

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x
Others  

Purpose

This article explains step by step approach to cancel a gpdbrestore

Cause

Killing a gpdbrestore on the master server (mdw) does not terminate the restore completely and leaves orphaned restore agents on the segment servers still running.

Procedure

Are we killing this restore because we think the restore is hanging and do not have a clue if it is progressing or on which stage of the restore it is? If yes then please refer to below KB article first

Troubleshooting long running gpcrondump jobs

If we have already troubleshooted the restore progress using the above KB and we would still want to go ahead and cancel the restore then do the below steps

1. Get the PID of the gpdbrestore process running on the Master

ps -ef | grep gpdbrestore | grep -v grep

[gpadmin@mdw gpadmin]$ ps -ef | grep gpdbrestore | grep -v grep
gpadmin 26523 11385 1 10:35 pts/3 00:00:00 python /usr/local/GP-4.3.8.1/bin/gpdbrestore -t 20160923103359 --redirect demo

ps -ef | grep gpdbrestore | grep -v grep | awk '{print $2}'

[gpadmin@mdw gpadmin]$ ps -ef | grep gpdbrestore | grep -v grep | awk '{print $2}'
26523

2. Once verified that the above PID is correct, move ahead with the kill (You would see "Terminated" if tailing the gpdbrestore log or viewing it interactively on the screen)

ps -ef | grep gpdbrestore | grep -v grep | awk '{print $2}' | xargs kill

[gpadmin@mdw gpadmin]$ ps -ef | grep gpdbrestore | grep -v grep | awk '{print $2}' | xargs kill

3. This would still leave the agent process running on both the master and the segments which need to be cleaned up

Get the current primary hostnames from gp_segment_configuration

ssh to master server
su - gpadmin
cd /home/gpadmin
psql
\o gp-primary
\t
select distinct hostname from gp_segment_configuration where role = 'p';
\q

[gpadmin@mdw kushal]$ cat /home/gpadmin/gp-primary
sdw2
sdw1
mdw

Now using gpssh, run the blow steps to clean out the agent dump processes running on the master and segment

gpssh -f /home/gpadmin/gp-primary

[gpadmin@mdw gpadmin]$ gpssh -f /home/gpadmin/gp-primary
=>

 4. Get the PID of the gp_restore_agent process running on the Master and segment hosts

ps -ef | grep gp_restore_agent | grep -v grep

=> ps -ef | grep gp_restore_agent | grep -v grep
[ mdw]
[sdw1] gpadmin 5131 5128 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_2_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg0/db_dumps/20160923 -p 46000 -U gpadmin --target-dbid 2 --target-host sdw1 --target-port 46000 -d "demo" /data1/gpdb/gpseg0/db_dumps/20160923/gp_dump_0_2_20160923103359.gz 2> /data1/gpdb/gpseg0/gp_restore_status_0_2_20160923103359 2>&2
[sdw1] gpadmin 5132 5129 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_5_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg3/db_dumps/20160923 -p 46003 -U gpadmin --target-dbid 5 --target-host sdw1 --target-port 46003 -d "demo" /data2/gpdb/gpseg3/db_dumps/20160923/gp_dump_0_5_20160923103359.gz 2> /data2/gpdb/gpseg3/gp_restore_status_0_5_20160923103359 2>&2
[sdw1] gpadmin 5133 5130 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_4_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg2/db_dumps/20160923 -p 46002 -U gpadmin --target-dbid 4 --target-host sdw1 --target-port 46002 -d "demo" /data2/gpdb/gpseg2/db_dumps/20160923/gp_dump_0_4_20160923103359.gz 2> /data2/gpdb/gpseg2/gp_restore_status_0_4_20160923103359 2>&2
[sdw1] gpadmin 5134 5127 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_3_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg1/db_dumps/20160923 -p 46001 -U gpadmin --target-dbid 3 --target-host sdw1 --target-port 46001 -d "demo" /data1/gpdb/gpseg1/db_dumps/20160923/gp_dump_0_3_20160923103359.gz 2> /data1/gpdb/gpseg1/gp_restore_status_0_3_20160923103359 2>&2
[sdw1] gpadmin 5135 5132 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_5_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg3/db_dumps/20160923 -p 46003 -U gpadmin --target-dbid 5 --target-host sdw1 --target-port 46003 -d demo /data2/gpdb/gpseg3/db_dumps/20160923/gp_dump_0_5_20160923103359.gz
[sdw1] gpadmin 5136 5134 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_3_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg1/db_dumps/20160923 -p 46001 -U gpadmin --target-dbid 3 --target-host sdw1 --target-port 46001 -d demo /data1/gpdb/gpseg1/db_dumps/20160923/gp_dump_0_3_20160923103359.gz
[sdw1] gpadmin 5137 5133 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_4_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg2/db_dumps/20160923 -p 46002 -U gpadmin --target-dbid 4 --target-host sdw1 --target-port 46002 -d demo /data2/gpdb/gpseg2/db_dumps/20160923/gp_dump_0_4_20160923103359.gz
[sdw1] gpadmin 5138 5131 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_2_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg0/db_dumps/20160923 -p 46000 -U gpadmin --target-dbid 2 --target-host sdw1 --target-port 46000 -d demo /data1/gpdb/gpseg0/db_dumps/20160923/gp_dump_0_2_20160923103359.gz
[sdw2] gpadmin 23752 23751 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_7_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg5/db_dumps/20160923 -p 46001 -U gpadmin --target-dbid 7 --target-host sdw2 --target-port 46001 -d "demo" /data1/gpdb/gpseg5/db_dumps/20160923/gp_dump_0_7_20160923103359.gz 2> /data1/gpdb/gpseg5/gp_restore_status_0_7_20160923103359 2>&2
[sdw2] gpadmin 23753 23748 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_9_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg7/db_dumps/20160923 -p 46003 -U gpadmin --target-dbid 9 --target-host sdw2 --target-port 46003 -d "demo" /data2/gpdb/gpseg7/db_dumps/20160923/gp_dump_0_9_20160923103359.gz 2> /data2/gpdb/gpseg7/gp_restore_status_0_9_20160923103359 2>&2
[sdw2] gpadmin 23754 23749 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_6_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg4/db_dumps/20160923 -p 46000 -U gpadmin --target-dbid 6 --target-host sdw2 --target-port 46000 -d "demo" /data1/gpdb/gpseg4/db_dumps/20160923/gp_dump_0_6_20160923103359.gz 2> /data1/gpdb/gpseg4/gp_restore_status_0_6_20160923103359 2>&2
[sdw2] gpadmin 23755 23750 0 10:36 ? 00:00:00 sh -c /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_8_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg6/db_dumps/20160923 -p 46002 -U gpadmin --target-dbid 8 --target-host sdw2 --target-port 46002 -d "demo" /data2/gpdb/gpseg6/db_dumps/20160923/gp_dump_0_8_20160923103359.gz 2> /data2/gpdb/gpseg6/gp_restore_status_0_8_20160923103359 2>&2
[sdw2] gpadmin 23756 23753 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_9_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg7/db_dumps/20160923 -p 46003 -U gpadmin --target-dbid 9 --target-host sdw2 --target-port 46003 -d demo /data2/gpdb/gpseg7/db_dumps/20160923/gp_dump_0_9_20160923103359.gz
[sdw2] gpadmin 23757 23752 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_7_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg5/db_dumps/20160923 -p 46001 -U gpadmin --target-dbid 7 --target-host sdw2 --target-port 46001 -d demo /data1/gpdb/gpseg5/db_dumps/20160923/gp_dump_0_7_20160923103359.gz
[sdw2] gpadmin 23758 23754 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_6_Y2hhbmdlbWU= --gp-d /data1/gpdb/gpseg4/db_dumps/20160923 -p 46000 -U gpadmin --target-dbid 6 --target-host sdw2 --target-port 46000 -d demo /data1/gpdb/gpseg4/db_dumps/20160923/gp_dump_0_6_20160923103359.gz
[sdw2] gpadmin 23759 23755 0 10:36 ? 00:00:00 /usr/local/GP-4.3.8.1/bin/gp_restore_agent --gp-c /bin/gunzip --gp-k 20160923103359_0_8_Y2hhbmdlbWU= --gp-d /data2/gpdb/gpseg6/db_dumps/20160923 -p 46002 -U gpadmin --target-dbid 8 --target-host sdw2 --target-port 46002 -d demo /data2/gpdb/gpseg6/db_dumps/20160923/gp_dump_0_8_20160923103359.gz
=>
ps -ef | grep gp_restore_agent | grep -v grep | awk '{print $2}'

=> ps -ef | grep gp_restore_agent | grep -v grep | awk '{print $2}'
[ mdw]
[sdw1] 5131
[sdw1] 5132
[sdw1] 5133
[sdw1] 5134
[sdw1] 5135
[sdw1] 5136
[sdw1] 5137
[sdw1] 5138
[sdw2] 23752
[sdw2] 23753
[sdw2] 23754
[sdw2] 23755
[sdw2] 23756
[sdw2] 23757
[sdw2] 23758
[sdw2] 23759
=>

5. Once verified that the above PID's are correct, move ahead with the kill

ps -ef | grep gp_restore_agent | grep -v grep | awk '{print $2}' | xargs kill

=>
ps -ef | grep gp_restore_agent | grep -v grep | awk '{print $2}' | xargs kill
[ mdw]
[sdw1]
[sdw2]
=>

6. Give it a minute and run the ps command again to make sure we do not get back anything

ps -ef | grep gp_restore_agent | grep -v grep

=> ps -ef | grep gp_restore_agent | grep -v grep
[ mdw]
[sdw1]
[sdw2]
=>

7. Make sure the gpdbrestore status files on both master and segments says that the restore was terminated, below message will be seen

20160919:13:18:16|gp_restore_agent-[ERROR]:-Error message from server: ERROR:  canceling statement due to user request
20160919:13:18:16|gp_restore_agent-[ERROR]:-The command was: select distinct(oid), typstorage from pg_type where oid in (select distinct atttypid from pg_attribute)
20160919:13:18:16|gp_restore_agent-[ERROR]:-*** aborted because of error: ERROR:  canceling statement due to user request
20160919:13:18:18|gp_restore_agent-[ERROR]:-*** aborted because of error: ERROR:  canceling statement due to user request


Related Articles

Cancel a backup gpcrondump job

 

Comments

Powered by Zendesk