Pivotal Knowledge Base

Follow

Greenplum Backup Time Collector (gpbackuptime_collector)

Environment

Product Version
GPDB 4.3.x
OS RHEL 6.x

Purpose

  • The backup is running slower; how to track why the backup is running slow?
  • How to gather the time for queries and table backup run by gpcrondump ?

Disclaimer

This is not an official tool from Pivotal, this is created in collaboration with Global Support and Services. In case you have any questions about bug reporting and enhancements, please comment at the end of the article to help us improve the tool.

Overview

What is gpbackuptime_collector?

gpbackuptime_collector is a utility to gather the time of gpcrondump spend to gather information from the database and the backup of the tables.

How does it work?

gpbackuptime_collector works by gather information from the segment logs on the query executed by the process and generate a report in a table format.

What are the prerequisites before running the gpbackuptime_collector?

  • Ensure that log_duration GUC is turned ON (before the backup was taken) for all the segments since without this GUC turned on, most the SQL don't get logged into the segment logs.
  • Ensure the clock of all segment servers are in sync.
  • The script needs a connection to the database to build the host map, ensure the database environment is sourced and the "psql -d template1" works
  • The gpbackuptime_collector is been tested and build from 4.3.6.2 code and above, you may run the program on lower version but it works the best on 4.3.6.x and above.

What are the caveats of gpbackuptime_collector?

  • It is not designed for parallel backup, if you have executed a backup of one database and simultaneously ran a backup of another database. This causes a conflict with the data and you may get an error result (i.e. the report becomes very confusing to understand)
  • If there are files (ends with .log) in the working directory, ensure it is moved or renamed to avoid conflict.
  • Script only gets the segment content information of current primaries when the script is called
  • The host map builder is only building the host map found at the segment log directory i.e pg_log, if the logs are on the different directories, it would require a manual host map creation.
  • The python version on all the OS should be 2.6.x.

Procedure 

How to execute the program gpbackuptime_collector? 

  1. Create a directory
mkdir -p gp_backup_time
cd gp_backup_time
  1. Download the gpbackuptime_collector.py attached to the document to the directory "gp_backup_time"
  2. Build a host map of segments in the format 
<hostname>:<fully-qualified-name-for-the-logfile>:<dbid>:<content>

The above process is automated by the script, so to build a host map with the logs from the date say "2016-03-21"

gpbackuptime_collector.py -b "2016-03-21"

To build a host map with the logs from the date say "2016-03-21" & "2016-03-22" and content 1 & 2

gpbackuptime_collector.py -b "2016-03-21,2016-03-22" -c 1,2
  1. Get the start-time and end-time for the backup
egrep "Starting gpcrondump|Exit" /home/gpadmin/gpAdminLogs/<gpcrondump_[DATE].log>

Example: The output received from the above command is below and we are interested in backup time for the one that started at "20160418:14:21:43" and exited at "20160418:14:22:18"

20160418:14:21:43:023744 gpcrondump:smdw:gpadmin-[INFO]:-Starting gpcrondump with args: -x gpadmin --prefix backup_and_restore
20160418:14:22:18:023744 gpcrondump:smdw:gpadmin-[INFO]:-Exit code zero, no warnings generated
20160418:14:22:57:023973 gpcrondump:smdw:gpadmin-[INFO]:-Starting gpcrondump with args: -x gpadmin --prefix backup_and_restore -a
20160418:14:23:31:023973 gpcrondump:smdw:gpadmin-[INFO]:-Exit code zero, no warnings generated

To execute the script with the build host map from above run the following:

gpbackuptime_collector.py -f hostmap -s "2016-04-18 14:21:43" -e "2016-04-18 14:22:18"

To enter into debug mode run the following

gpbackuptime_collector.py -f hostmap -s "2016-04-18 14:21:43" -e "2016-04-18 14:22:18" -d

How to read the output from gpbackuptime_collector?

Please refer to the article on how to read the output from "gpbackuptime_collector".

Version History

Tool Version Points of Contact Revision Date Upgrade or Modification details
Greenplum Backup Time Collector 1.0.0 Faisal Ali April 18, 2016  First Version created

Comments

Powered by Zendesk