Pivotal Knowledge Base

Follow

How to use gpcrondump with named pipes

Goal

The purpose of gpcrondump is to backup the database , this is done in producing backup files , files per segment (2 output data files for master, 1 output data file per segment) , the output files are created in the directory <segment_data_directory>/db_dumps/YYYYMMDD/ directory.

In addition gpcrondump has an option to backup the database into named pipes, allowing data to be then read during the backup by a consumer utility and redirected to different set of files/locations.

In this guide, we would look at an example on how to backup the database using gpcrondump with named pipes.

Disclaimer

Please verify the script mentioned in the documents on a test cluster before running it on production or important clusters/database, as those scripts are for education purpose only.

Solution

The document provides you step by step procedure on how to use named pipes with gpcrondump , which should be followed sequentially.

  1. Generate file list

Before you start the backup using named pipes , you would need to create the named pipes , you can get the list of named pipes to be created using the option "-list-backup-files" along with gpcrondump.

When you specify both this option and the -K <timestamp> option, gpcrondump does not perform a backup. gpcrondump creates two text files that contain the names of the files that will be created when gpcrondump backs up Pivotal Greenplum (GPDB). The text files are created in the same location as the backup files.

So the command to create those files are

gpcrondump -x <database-name> -K <time-stamp> --list-backup-files

The above query would provide us the location and the name of the file and are listed in two separate files.

For the current setup ( i.e in this example ) we will need the names of the named pipes so we have used timestamp has 20141006100000 , so output file is "$MASTER_DATA_DIRECTORY/db_dumps/20141006/gp_dump_20141006100000_pipes".

Let export the named pipes files to a variable, so its easier to use it on the rest of the steps

export NAMEDPIPESFILES=/data/master/gpfai9-1/db_dumps/20141006/gp_dump_20141006100000_pipes
  1. Create the directories referenced in the file list in step 1 , the directory on master will be created by the previous command ( gpcrondump) but everything else ( like segments) need to be created manually , use the below script to create directories on all segments.
cat $NAMEDPIPESFILES | while read line
do
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
export location=`echo $line| cut -d':' -f2 | sed 's/gp_dump.*//'`
echo "ssh " $hostname "\"echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ; mkdir -p $location ; echo \" "
done > /tmp/make_directory.log

to verify if the directories are created properly

cat $NAMEDPIPESFILES | while read line
do
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
export location=`echo $line| cut -d':' -f2 | sed 's/gp_dump.*//'`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ;  ls -l $location ; echo \" "
done > /tmp/verify_directory.log
  1. Create the named pipes referenced in the file list in step1 and make them writable
cat $NAMEDPIPESFILES | while read line 
do 
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
namedpipe=`echo $line | cut -d: -f2`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ; echo $namedpipe ; mkfifo $namedpipe ;chmod u+w $namedpipe ; echo \" "
done > /tmp/create_namedpipes.log
  1. Start consumer processes for the pipes (processes that will read from the named pipes and consume data and storing it somewhere else) - these will block waiting for the writer in the named pipes , here we have named the file as "*.data_destination" , you can choose any location or name.
cat $NAMEDPIPESFILES | while read line 
do 
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
consumerpipe=`echo $line | cut -d: -f2`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ; cat $consumerpipe > $consumerpipe.data_destination & \" "
done > /tmp/create_consumerpipe.log

To verify if the consumer / named pipes are created properly

cat $NAMEDPIPESFILES | while read line
do
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
consumerpipefile=`echo $line | cut -d: -f2`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ;echo ;  echo \" File info: \" ; echo ; ls -l $consumerpipefile* ; echo ; echo \" Process info: \" ; echo ; ps -ef | grep -v grep | grep $consumerpipefile ; echo \" "
done > /tmp/verify_consumerpipefile.log
  1. Do the dump - dump will write into the pipes and the consumer will receive the data and do the processing
gpcrondump -x <database-name> -K <time-stamp>
  1. Once the step 5 is complete , check result and optional clean up

To verify if the backups files size( make sure to replace the *.data_destination with the changes made in the step 4).

cat $NAMEDPIPESFILES | while read line 
do 
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
backupfile=`echo $line | cut -d: -f2`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ; ls -l $backupfile.data_destination ; echo \" "
done > /tmp/verify_backup.log

To cleanup the named pipes.

cat $NAMEDPIPESFILES | while read line
do
export hostname=`echo $line| cut -d':' -f1`
export segment_info=`echo $line| cut -d':' -f2 | sed 's/db_dump.*//'`
removeconsumerpipe=`echo $line | cut -d: -f2`
echo "ssh " $hostname "\" echo \\"[----------------------\\" $hostname : $segment_info \\"----------------------] \\" ; echo ;  rm $removeconsumerpipe ; ps -ef | grep -v grep | grep $removeconsumerpipe | awk '{print \$2}' | xargs -n1 /bin/kill 2>/dev/null ; echo \" "
done > /tmp/remove_consumerpipe.log

Verify the consumer pipes are clean using the command mentioned on step 4.

Example:

The example below would be running the script that is generated from the steps mentioned above.

  • Generate file list
gpadmin:Fullrack@mdw $ gpcrondump -x test -K 20141006100000 --list-backup-files
20141008:03:58:46:022825 gpcrondump:mdw:gpadmin-[INFO]:-Starting gpcrondump with args: -x test -K 20141006100000 --list-backup-files
20141008:03:58:47:022825 gpcrondump:mdw:gpadmin-[INFO]:-Added the list of pipe names to the file: /data/master/gpfai9-1/db_dumps/20141006/gp_dump_20141006100000_pipes
20141008:03:58:47:022825 gpcrondump:mdw:gpadmin-[INFO]:-Added the list of file names to the file: /data/master/gpfai9-1/db_dumps/20141006/gp_dump_20141006100000_regular_files
20141008:03:58:47:022825 gpcrondump:mdw:gpadmin-[INFO]:-Successfully listed the names of backup files and pipes
  • Create dump directories

Creating directory for the backup to work

gpadmin:Fullrack@mdw $ /bin/sh /tmp/make_directory.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]
[---------------------- sdw10 : /data1/primary/gpfai91/ ----------------------] [......]

Verification of directory

gpadmin:Fullrack@mdw $ /bin/sh /tmp/verify_directory.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]

total 0
[......]
[---------------------- mdw : /data/master/gpfai9-1/ ----------------------]
total 8
-rw------- 1 gpadmin gpadmin 1082 Oct  9 09:03 gp_dump_20141006100000_pipes -rw------- 1 gpadmin gpadmin  469 Oct  9 09:03 gp_dump_20141006100000_regular_files
  • Create named pipes with the specified names and make them writable
gpadmin:Fullrack@mdw $ /bin/sh /tmp/create_namedpipes.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]

/data1/primary/gpfai90/db_dumps/20141006/gp_dump_0_2_20141006100000.gz

[......]

[---------------------- mdw : /data/master/gpfai9-1/ ----------------------]

/data/master/gpfai9-1/db_dumps/20141006/gp_dump_1_1_20141006100000_post_data.gz
  • Start consumer processes

consumer processes will read data from named pipes and redirect data to different file in the same directory (*.data_destination)

gpadmin:Fullrack@mdw $ /bin/sh /tmp/create_consumerpipe.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]
[---------------------- sdw10 : /data1/primary/gpfai91/ ----------------------] [......]

Verification of creation or consumer/named pipes

gpadmin:Fullrack@mdw $ /bin/sh /tmp/verify_consumerpipefile.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]

File info:

prw------- 1 gpadmin gpadmin 0 Oct  9 09:06 /data1/primary/gpfai90/db_dumps/20141006/gp_dump_0_2_20141006100000.gz
-rw------- 1 gpadmin gpadmin 0 Oct  9 09:07 /data1/primary/gpfai90/db_dumps/20141006/gp_dump_0_2_20141006100000.gz.data_destination

Process info:

gpadmin  12128     1  0 09:07 ?        00:00:00 cat /data1/primary/gpfai90/db_dumps/20141006/gp_dump_0_2_20141006100000.gz

[......]
  • Do the backup 
gpadmin:Fullrack@mdw $ gpcrondump -x test -K 20141006100000
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Starting gpcrondump with args: -x test -K 20141006100000
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:----------------------------------------------------
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Master Greenplum Instance dump parameters
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:----------------------------------------------------
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump type                            = Full database
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Database to be dumped                = test
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Master port                          = 4300
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Master data directory                = /data/master/gpfai9-1
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Run post dump program                = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Rollback dumps                       = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump file compression                = On
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Clear old dump files                 = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Update history table                 = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Secure config files                  = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump global objects                  = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Vacuum mode type                     = Off
20141008:08:02:16:002706 gpcrondump:mdw:gpadmin-[INFO]:-Ensuring remaining free disk         > 10

Continue with Greenplum dump Yy|Nn (default=N):
> y
20141008:08:02:20:002706 gpcrondump:mdw:gpadmin-[INFO]:-Directory /data/master/gpfai9-1/db_dumps/20141006 exists
20141008:08:02:20:002706 gpcrondump:mdw:gpadmin-[INFO]:-Checked /data/master/gpfai9-1 on master
20141008:08:02:22:002706 gpcrondump:mdw:gpadmin-[INFO]:-Configuring for single database dump
20141008:08:02:22:002706 gpcrondump:mdw:gpadmin-[INFO]:-Validating disk space
20141008:08:02:23:002706 gpcrondump:mdw:gpadmin-[INFO]:-Adding compression parameter
20141008:08:02:23:002706 gpcrondump:mdw:gpadmin-[INFO]:-Adding --no-expand-children
20141008:08:02:23:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump process command line gp_dump -p 4300 -U gpadmin --gp-d=db_dumps/20141006 --gp-r=/data/master/gpfai9-1/db_dumps/20141006 --gp-s=p --gp-k=20141006100000 --no-lock --gp-c --no-expand-children test
20141008:08:02:23:002706 gpcrondump:mdw:gpadmin-[INFO]:-Starting Dump process
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump process returned exit code 0
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Timestamp key = 20141006100000
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Checked master status file and master dump file.
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump status report
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:----------------------------------------------------
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Target database                          = test
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump subdirectory                        = 20141006
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump type                                = Full database
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Clear old dump directories               = Off
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump start time                          = 10:00:00
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump end time                            = 08:02:26
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Status                                   = COMPLETED
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump key                                 = 20141006100000
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Dump file compression                    = On
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Vacuum mode type                         = Off
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-Exit code zero, no warnings generated
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:----------------------------------------------------
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[WARNING]:-Found neither /usr/local/GP-4.3.1.0/bin/mail_contacts nor /home/gpadmin/mail_contacts
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[WARNING]:-Unable to send dump email notification
20141008:08:02:26:002706 gpcrondump:mdw:gpadmin-[INFO]:-To enable email notification, create /usr/local/GP-4.3.1.0/bin/mail_contacts or /home/gpadmin/mail_contacts containing required email addresses
  • Result
gpadmin:Fullrack@mdw $ /bin/sh /tmp/verify_backup.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]

-rw------- 1 gpadmin gpadmin 9578 Oct  9 09:07 /data1/primary/gpfai90/db_dumps/20141006/gp_dump_0_2_20141006100000.gz.data_destination

[---------------------- sdw10 : /data1/primary/gpfai91/ ----------------------]

-rw------- 1 gpadmin gpadmin 4551 Oct  9 09:07 /data1/primary/gpfai91/db_dumps/20141006/gp_dump_0_3_20141006100000.gz.data_destination

[......]

Optional cleanup

- if the backup failed and no data was sent into the named pipes, the consumer processes will not exit
- named pipes will not be automatically deleted, they will stay there until deleted , to remove the process / pipes, run the below file.

gpadmin:Fullrack@mdw $ /bin/sh /tmp/remove_consumerpipe.log
[---------------------- sdw10 : /data1/primary/gpfai90/ ----------------------]

[......]

Comments

Powered by Zendesk