Pivotal Greenplum Database (GPDB) all versions
Is to provide some easy method to move table within clusters or database without the hassle of taking backup and restore, it very useful if any developers want a replica of the original tables for their development and testing on different/another database.
Note- Please verify the script on a test cluster , before running it on production .
The script that you would need to replicate the table quickly to another database is
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
Note- If you wish you replicate the entire database , would recommend to have a look at the link here
Let take a simple example to understand the above example.
- Firstly Creating database / table / loading data into it / generating source database.
-- Source database
gpadmin=# create database d1;
Time: 8855.715 ms
gpadmin=# \c d1
You are now connected to database "d1" as user "gpadmin".
-- Creating the table to replicate and loading data to it.
d1=# create table t1 as select * from pg_attribute;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'attrelid' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
Time: 1705.794 ms
-- Creating the destination database.
d1=# create database d2;
Time: 10753.638 ms
- Running the command to backup / load data simultaneously
As you can see the command took the backup and loaded them simultaneously
gpadmin:Fullrack@mdw $ pg_dump -C -t t1 d1 | psql d2
Timing is on.
Pager usage is off.
Time: 184.239 ms
Time: 0.740 ms
Time: 0.677 ms
Time: 0.652 ms
Time: 0.598 ms
Time: 0.635 ms
Time: 1.565 ms
Time: 0.640 ms
Time: 0.601 ms
Time: 66.210 ms
Time: 18.538 ms
Time: 1247.149 ms
Note- This is loaded to a different database on the same cluster. So, many parameters that are mentioned above were removed as they are not needed here. If you wish to replicate them on a different cluster ensure you have the pg_hba.conf updated in the source location with the correct information so that the psql can connect to the source host and database without any hassle.
- Cross verification
Cross verification shows everything is good and the table is loaded on the destination database.
gpadmin:Fullrack@mdw $ psql d2
Timing is on.
Pager usage is off.
Type "help" for help.
List of relations
Schema | Name | Type | Owner | Storage
public | t1 | table | gpadmin | heap
d2=# select count(*) from t1;
Time: 82.246 ms
- If the source table has many grant permission then the pg_dump also will run them on the destination destination , to eliminate the grant ( permission part) you can use the command such as below.
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | egrep -v "GRANT|REVOKE" | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
- To load just the data use pg_dump with -a option.
pg_dump -a -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
- For more option on what you can use along with the above mentioned command you can refer to the help from pg_dump
If the data set is very huge, check gptransfer utility, here is an example on how to use it.
You can also use psql to do the same task , make sure the DDL of the table is available on the destination.
psql -c "copy (select * from <source-table>) to stdout" <source-database> | psql -c "copy <destination-table> from stdin" <destination database>