In this article we will take a look at some common frequently asked backup related question.
1. Describe in brief the architecture of greenplum backup?
When you execute a full database backup this is what happens behind the scenes.
- The syntax of the command is checked
- The existence of the location is checked, if not available its created
- The location where the backup is about to be taken, the available size is checked and compared with the database backup size
- Once these above checks are done the backup is started.
- First a Exclusive lock is acquired on pg_class, so that there is no modification to the DDL and its dumped.
- Once the DDL backup completes then it acquire lock on all the user tables in Access Share Mode.
- The lock on pg_class is released.
- After that backup agents are send to all the segments that are part of the greenplum cluster
- The backup agents then starts to dump each segment data in parallel
- Once its complete , the agents exists and the locks on the tables are released.
- The backup is now completed.
2. Describe in brief incremental architecture backup ?
When you execute a database backup (incremental) this is what happens behind the scenes.
- Every time a backup is run on GPDB, the gpcrondump utility creates state files for AO/CO tables.
- These state files are created in order to store the tuple/row counts for each AO/CO table/partition in the database.
- These state files also store the metadata operations (truncate, alter, etc.) performed on any of the tables in the database.
- No state files are created for heap tables since they get backed up with every incremental backup.
- When the user performs an incremental backup, the gpcrondump utility compares the state of each table in the database against last backup using the state files.
- If there is any change in the state of the table since last backup, it is marked as dirty and is backed up during incremental backup.
- This procedure is followed only for AO/CO tables.
3. What are different backup & restore utilities available ?
Different tool to perform backup / restore are
- gpcrondump / gpdbrestore
- gp_dump / gp_restore
- pg_dump / pg_restore
- copy .. to .. / copy .. from ..
Please refer to the documentation for more information on the utilities.
4. How to generate DDL for a table?
Use pg_dump utility to generate DDL.
pg_dump -t njonna.accounts -s -f ddl_accounts.sql Where: -f ddl_accounts.sql is output file. -t njonna.accounts is table name with schema njonna. -s dump only schema no data
5. What are the tools available in GPDB to take backup and restores?
For non-parallel backups: Use postgres utililities (pg_dump, pg_dumpall for backup, and pg_restore for restore).
Another useful command for getting data out of database is
COPY <TABLE> to <File>.
For parallel backups: gp_dump and gpcrondump for backups and gp_restore for restore process.
6. What is gpcrondump?
A wrapper utility for gp_dump, which can be called directly or from a crontab entry.
gpcrondump -x <database_name>
7. How do I back up just the schema for all of my databases?
The pg_dumpall utility extracts all databases in a GPDB system to a single script file. It will generate the DDL for all objects, including global objects such as resource queues and roles.
$ pg_dumpall --gp-syntax --schema-only > db_dump.sql
8. How do I back up just the schema for one of my databases?
The gp_dump utility dumps the contents of a database into SQL script files, which can then be used to restore the database schema and user data at a later time using gp_restore. The following command will create a schema dump of the template1 database:
$ gp_dump -s template1
9. How do I back up just the data in my database?
The following command will create a data dump of the test database:
$ gp_dump -a test
10. How do I clone a database?
The following command will create a full copy of your database into a new database:
# CREATE DATABASE new_dbname TEMPLATE old_dbname;
11. How do I schedule the backup of my entire database?
The gpcrondump utiity is a wrapper utility for gp_dump, which can be called directly or from a crontab entry.
The following example is a Linux crontab entry that runs a backup of the test database (and global objects) nightly at one past midnight:
source /home/gpadmin/.bashrc 01 0 * * * gpadmin gpcrondump -x test -c -g -G -a -q >> gp_test.log
To back up multiple databases, you will need multiple gpcrondump entries in crontab. You will need to source your .bashrc file, .profile or greenplum_path.sh file to ensure you have the correct environment variables set.
12. Whats the difference between gp_dump / pg_dump ?
The pg_dump and pg_dumpall utilities are used primarily for moving data from a Greenplum database to another database platform. They do not manage segment information; instead, they creates a single non-parallel dump file. A pg_dump created archive cannot be used with the gp_restore utility.
The utilities gp_dump and gpcrondump do not back up global objects such as resource queues and roles. You will need to use pg_dumpall to back up global objects.
Refer to the GPDB Administrator Guide associated with the version you are using for more options regarding these utilities.
13. What are the alter method to replicate data other than using the utilities mentioned above ?
Some alternative ways to transfer data ( if the data set is very small ) has been discussed in the below article.
- Transferring data via psql
- Transferring data via gptransfer
- Transferring data without backup
- Transferring data via gpfdist
For more detailed information please refer to the GPDB Administrator's Guide.