Pivotal Knowledge Base

Follow

Troubleshooting long-running gpdbrestore jobs

Environment 

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x

Overview

This article explains how gpdbrestore works and would help to troubleshoot a long-running restore job.

High-level steps during restore

  1. The database is created using gp_cdatabase_1_1_<timestamp> file
  2. Global objects are loaded with gp_dump_global__<timestamp>
  3. DDLs are created using gp_dump_1_1_<timestamp>.gz
  4. Then, segments will be notified to start the parallel dump using their local gp_dump_0_<dbid>_<timestamp>.gz -- where dbid is the segment dbid
  5. Once all segments are finished, then the master will run gp_dump_1_1_<timestamp>_post_data.gz to create indexes
  6. Full database ANALYZE will be done; to skip use --noanalyze 

Example: You can note that many of the above steps are in the utility logs:

20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Starting gpdbrestore with args: -t 20160402104940
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Greenplum database restore parameters
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Restore type               = Full Database
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Database to be restored    = prod
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Drop and re-create db      = Off
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Restore method             = Restore specific timestamp
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Restore timestamp          = 20160402104940
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Restore compressed dump    = On
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Restore global objects     = Off
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Array fault tolerance      = f
20160402:10:50:25:040398 gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
20160402:10:50:26:040398 gpdbrestore:mdw:gpadmin-[INFO]:-gp_restore commandline: gp_restore -i -h mdw -p 5432 -U gpadmin --gp-i --gp-k=20160402104940 --gp-l=p --gp-d=db_dumps/20160402 --gp-c -d prod: 
20160402:10:50:32:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Commencing analyze of prod database, please wait
20160402:10:50:42:040398 gpdbrestore:mdw:gpadmin-[INFO]:-Analyze of prod completed without error

 Logs to check 

    • Utility logs will be under ~/gpAdminLogs/gpdbrestore_<date>.log
    •  The restore report file would be only on the master and gives results for all the segments. It would be under $MASTER_DATA_DIRECTORY
    • Restore status file would be on all the segments and would be under $SEG_DIR -- where SEG_DIR could be master OR segment directory. The following example shows both the files from the master. 
gpadmin@mdw$ ls  $MASTER_DATA_DIRECTORY/*20160402104940*
gp_restore_20160402104940.rpt
gp_restore_status_1_1_20160402104940
  • If backup has been taken using "-u", then the restore report and status file will be generated under <DIR>/db_dumps/<date> -- Where <DIR> is the location used with -u and date would be the folder for the backup timestamp used.
  • --report-status-dir: Can also be used to specify the location of the report and status file

Key points to note

1. If both -u and --report-status-dir are specified, then
--report-status-dir takes precedence over "-u".
2. If the backup was taken without the "-u" flag and the restore does not have
the --report-status-dir flag set, then the status and report files are
created in the MASTER_DATA_DIRECTORY or the SEGMENT_DIRECTORY.

Data Domain specific logging

The -u option is not supported if --ddboost is specified. Instead,
we could use the --report-status-dir flag to write the status and report
file to a different folder. If not specified, it writes to the $SEG_DIR.
It does not write any log files (status or rpt) to the Data Domain.

For detailed information about gpdbrestore options, refer to the online documentation.

Symptom 

Restore is running for a very long time and the utility STDOUT OR ~/gpAdminLogs/gpdbrestore_<date>.log shows no progress.

It could be the case that there is no issue and the restore is taking ideal time. The following section has been divided into two categories to check/troubleshoot and find out the real issue:

1 ) Database level troubleshooting

  • Check which step the restore is currently on (refer high-level steps in the overview).
  • If it was found that restore is currently running on master only, then we need to check from database if it is stuck.
  • Metadata restore contains DDLs (CREATE and ALTER) and the table name should keep changing when checked through pg_log or pg_stat_activity .

There are two ways to check:

  1. SELECT current_query FROM pg_stat_activity
  2. tail -f $MASTER_DATA_DIRECTORY/pg_log/<latest_log_file>.csv

Note: It could be the case that metadata restore is taking a lot of time as compared to metadata backup.This is because the object creation would be done on all the segments during restore but was dumped from only master during backup. In this condition, we have to wait for the metadata restore to finish (the object name should change here also).

Flow of gpdb_restore_agent

  • DROP DATABASE: No agent running on master and segments
=> ps -ef | grep gp_restore_agent | grep -v grep | grep -v gp_restore_status | wc -l
[ mdw] 0
[sdw2] 0
[sdw1] 0
  • DDL statements: Agent running only on master
=> ps -ef | grep gp_restore_agent | grep -v grep | grep -v gp_restore_status | wc -l
[ mdw] 1
[sdw2] 0
[sdw1] 0
  • COPY statement: Agents running only on segments
=> ps -ef | grep gp_restore_agent | grep -v grep | grep -v gp_restore_status | wc -l
[ mdw] 0
[sdw2] 4
[sdw1] 4 
  • POST DATA, CREATE INDEX: Agents running only on master
=> ps -ef | grep gp_restore_agent | grep -v grep | grep -v gp_restore_status | wc -l
[ mdw] 1
[sdw2] 0
[sdw1] 0
  • COPY stuck on one segment: Agent would be running only on 1 segment
PGOPTIONS='-c gp_session_role=utility' psql -p 46000
gpadmin=# select * from pg_stat_activity where procpid = 25166;
datid | datname | procpid | sess_id | usesysid | usename | current_query | waiting | query_start | backend_start
| client_addr | client_port | application_name | xact_start | waiting_reason
----------+------------------+---------+---------+----------+---------+-------------------------------------+---------+-------------------------------+------------------------------
-+-------------+-------------+------------------+-------------------------------+----------------
81402773 | gpdbrestore-test | 25166 | 7429 | 10 | gpadmin | COPY gpcron_test (a, b) FROM stdin; | f | 2016-06-01 15:13:04.093734-07 | 2016-06-01 15:13:03.977945-07
| 172.28.8.2 | 15928 | psql | 2016-06-01 15:13:04.093734-07 |
(1 row) => ps -ef | grep gp_restore_agent | grep -v grep | grep -v gp_restore_status | wc -l
[ mdw] 0
[sdw2] 1
[sdw1] 0
=> [gpadmin@sdw2 ~]$ ps -ef | grep 25166;
gpadmin 25166 31230 92 15:13 ? 00:00:28 postgres: port 46000, gpadmin gpdbrestore-test 172.28.8.2(15928) con7429 172.28.8.2(15928) cmd2 COPY
gpadmin 25283 24915 0 15:13 pts/1 00:00:00 grep 25166
[gpadmin@sdw2 ~]$
[gpadmin@sdw2 ~]$ strace -p 25166
Process 25166 attached - interrupt to quit
recvfrom(8, "\0\0\0\02053277557\t\\N\nd\0\0\0\02053277558\t\\N"..., 8192, 0, NULL, NULL) = 8192
recvfrom(8, "\nd\0\0\0\02053278039\t\\N\nd\0\0\0\02053278040\t"..., 8192, 0, NULL, NULL) = 8192
lseek(16, 0, SEEK_END) = 846659584
semop(7241838, 0x7fff818ce190, 1) = 0
semop(7209069, 0x7fff818ce190, 1) = 0
lseek(16, 715587584, SEEK_SET) = 715587584
write(16, "f\0\0\0\350_\345\250\1\0\0\0H\16`\16\0\200\4\200\340\3778\0\300\3778\0\240\3778\0"..., 32768) = 32768
semop(7241838, 0x7fff818ce190, 1) = 0
lseek(16, 846659584, SEEK_SET) = 846659584
write(16, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
semop(7241838, 0x7fff818cd940, 1) = 0
semop(7209069, 0x7fff818cd940, 1) = 0
semop(7241838, 0x7fff818cd940, 1) = 0
semop(7209069, 0x7fff818cd940, 1) = 0
write(10, "^\320\1\0\1\0\0\0f\0\0\0\0\200<\272\30\0\0\0q\0051\0\0\0\0\0\0\0\354\344"..., 131072) = 131072
  • In case COPY is running on one/few segments only, then we can trace the processes as shown above to check why it is still running. There could be one possibility that there was more data on that segment when compared to others. A quick comparison between the datafiles can rule this out. 
  • To trace the process, we have to check through OS/Networking level as described in the following sections.

2) Check at OS/UNIX Level

Process Tracing

  • Make sure that the process on OS is not hung, see the following example on how to find OS PID and continue OS troubleshooting.
  • Use strace on the restore process to troubleshoot further; refer the KB article Troubleshoot Hung Statement for more information.
  • For strace, the output should show activity that is progressing, the example below shows us how we can get the process id from pg_stat_activity and then do an strace on the unix pid.

Example:

  • Check through pg_stat_activity through utility mode or through COPY process

db_prod2=# select * from pg_stat_activity;
datid | datname | procpid | sess_id | usesysid | usename |
current_query

| waiting | query_start | backend_start | client_addr | client_port | application_name | xact_start
| waiting_reason
+----------------
69182919 | gpdbrestore-test | 32320 | 58186104 | 10 | gpadmin | COPY oblgr_b (period_dt, period_type_cd, oblgr_id, app_cd, inst_id, oblgr_type_cd, oblgr_nm, oblgr_short_nm, oblgr_addl
_nm, duns_num, ssn, tin, interlock_cd, sic_code, team_relshp_num, geogrphc_ind, geogrphc_cntry_cd, acct_opened_dt, last_scrn_updt_dt, last_updtd_dt, last_updtd_userid, cntry_of_risk_cd, swp_up
d_ind, oblgr_street_add1, oblgr_street_add2, oblgr_street_add3, oblgr_city, oblgr_state, oblgr_zip, oblgr_assgn_unit, oblgr_contact_offcr, naics_cd, spec_info_offc, oblgr_telephone_num, conv_r
ef_ai616, old_obligor_no) FROM stdin; | f | 2016-05-25 14:17:35.060822-07 | 2016-05-25 12:55:40.988429-07 | 172.28.8.250 | 53208 | psql | 2016-05-25 14:17:35.060822-07

[gpadmin@mdw pg_log]$ ps -ef | grep 32320
gpadmin 11388 15791 0 13:13 pts/31 00:00:00 grep 32320
gpadmin 32320 16750 30 12:55 ? 00:05:38 postgres: port 54320, gpadmin gpdbrestore-test 172.28.8.250(53208) con58186104 172.28.8.250(53208) cmd32683 COPY
[gpadmin@mdw pg_log]$

Trace using strace command: Below trace showed that command is moving and writing data

[gpadmin@sdw2 ~]# strace -p 32320
Process 32320 attached - interrupt to quit
futex(0x2b2a4aded9d0, FUTEX_WAIT, 4116, NULL) = 0
select(3, NULL, [2], NULL, {1, 0}) = 1 (out [2], left {1, 0})
write(2, "\0\0\0\0\37\1\0\0@~\0\0\360\201\306:\360\201\306:\0\0\0\0tcf\33\377\177\0\0"..., 335) = 335
sendto(8, "C\0\0\0\vCOPY 0\0Z\0\0\0\5I", 18, 0, NULL, 0) = 18
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={18, 0}}, NULL) = 0
recvfrom(8, "Q\0\0\0\344COPY clp_product_fra (recor"..., 8192, 0, NULL, NULL) = 229
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0

Resource Utilization

  • Make sure all system resources like memory, CPU and I/O are efficient
  • sar -P ALL 1 2 (CPU) ; sar -r 1 3 (Memory and swap) ; sar -b 1 3 (Disk I/O)
  • None of the resources should be over-utilized
  • 100% CPU/memory and excessive disk I/O indicates trouble

Sample output

[root@gpdb-sandbox ~]# sar -P ALL 1 1
Linux 2.6.32-573.el6.x86_64 (gpdb-sandbox.localdomain) 04/23/2016 _x86_64_ (2 CPU) 01:44:21 PM CPU %user %nice %system %iowait %steal %idle
01:44:22 PM all 0.00 0.00 2.01 0.00 0.00 97.99
01:44:22 PM 0 0.00 0.00 4.00 0.00 0.00 96.00
01:44:22 PM 1 0.00 0.00 0.99 0.00 0.00 99.01
[root@gpdb-sandbox ~]# sar -r 1 1
Linux 2.6.32-573.el6.x86_64 (gpdb-sandbox.localdomain) 04/23/2016 _x86_64_ (2 CPU) 01:45:28 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
01:45:29 PM 228852 7817136 97.16 158548 7147320 2072928 20.44
Average: 228852 7817136 97.16 158548 7147320 2072928 20.44
[root@gpdb-sandbox ~]# sar -b 1 1
Linux 2.6.32-573.el6.x86_64 (gpdb-sandbox.localdomain) 04/23/2016 _x86_64_ (2 CPU) 01:46:00 PM tps rtps wtps bread/s bwrtn/s
01:46:01 PM 0.00 0.00 0.00 0.00 0.00
Average: 0.00 0.00 0.00 0.00 0.00

Important

If the OS stats on any resources such as memory, CPU, or I/O do not look good or over utilized then check the 2 things listed below(or ask the customer):

  1. Check if any other GPDB utilities or other application batch jobs are running in parallel. Examples would be gptransfer and a bulk insert all running at the same time.
  2. Also check on the slow/hanging segment host or with the customer/user if any nonGPDB applications are running and consuming OS resources.

Check Network Level

Check Hardware level

Open ticket with DataDomain Support

In case data domain has been used for backup/restore, and no issue found after following the above steps, then ask the customer to open an SR with Data Domain Support to troubleshoot further from DD side.

Additional Information

A few important flags

-B <parallel_processes> The number of parallel processes can be sometimes oversubscribed and the server can be CPU-starved. This can cause the restore to run very slow. If this flag is used, make sure to use OS troubleshooting guide above to check CPU idle time.

-R <hostname>:<path_to_dumpset> The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). So this host needs to be accessible by all the servers. Files will be copied first to the local segment and then the gpdbrestore would start. Sometimes there can be trouble or slowness reaching the remote host because of a network issue. Use the networking troubleshooting guide for more details.

Tip: To find out how fast the scp of the dump files from the remote host to the GPDB host is happening, do a "ls -ltrh" on the folder where the dump files are getting copied to and see if the bytes are increasing on the file.

Related KB Articles

Refer to Troubleshooting Slow Running Backups if the backup was also causing performance issues

Comments

  • Avatar
    Michael Mulcahy

    Hi Balaji,

    Can this KB be made public for customers? I don't see any reason why not.

    Thanks,
    Mick

Powered by Zendesk