Is to provide some easy method to move table within clusters or database without the hassle of taking backup and restore , it very useful if any developers wants a replica of the original tables for their development and testing on different/another database.
Note : Please verify the script on a test cluster , before running it on production .
The script that you would need to replicate the table quickly to another database is
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
NOTE: If you wish you replicate the entire database , would recommend to have a look at the link here
Let take a simple example to understand the above example.
- Firstly Creating database / table / loading data into it / generating source database.
-- Source database
gpadmin=# create database d1;
Time: 8855.715 ms
gpadmin=# \c d1
You are now connected to database "d1" as user "gpadmin".
-- Creating the table to replicate and loading data to it.
d1=# create table t1 as select * from pg_attribute;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'attrelid' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
Time: 1705.794 ms
-- Creating the destination database.
d1=# create database d2;
Time: 10753.638 ms
- Running the command to backup / load data simultaneously
As you can see the command took the backup and loaded them simultaneously
gpadmin:Fullrack@mdw $ pg_dump -C -t t1 d1 | psql d2
Timing is on.
Pager usage is off.
Time: 184.239 ms
Time: 0.740 ms
Time: 0.677 ms
Time: 0.652 ms
Time: 0.598 ms
Time: 0.635 ms
Time: 1.565 ms
Time: 0.640 ms
Time: 0.601 ms
Time: 66.210 ms
Time: 18.538 ms
Time: 1247.149 ms
PLEASE NOTE: I'm loading this to different database on the same cluster so many parameter mentioned above was removed as not needed here, if you wish to replicate them on a different cluster ensure you have the pg_hba.conf updated in the source location with the correct information so that the psql can connect to the source host and database without any hassle.
- Cross verification
Cross verification shows everything is good and the table is loaded on the destination database.
gpadmin:Fullrack@mdw $ psql d2
Timing is on.
Pager usage is off.
Type "help" for help.
List of relations
Schema | Name | Type | Owner | Storage
public | t1 | table | gpadmin | heap
d2=# select count(*) from t1;
Time: 82.246 ms
- If the source table has many grant permission then the pg_dump also will run them on the destination destination , to eliminate the grant ( permission part) you can use the command such as below.
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | egrep -v "GRANT|REVOKE" | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
- To load just the data use pg_dump with -a option.
pg_dump -a -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
- For more option on what you can use along with the above mentioned command you can refer to the help from pg_dump
If the data set is very huge, check gptransfer utility, here is an example on how to use it.
You can also use psql to do the same task , make sure the DDL of the table is available on the destination.
psql -c "copy (select * from <source-table>) to stdout" <source-database> | psql -c "copy <destination-table> from stdin" <destination database>