Pivotal Knowledge Base

Follow

How to Import Binary Data into Pivotal Greenplum(GPDB) Using COPY

Goal

In this document we will take an example of how to import binary data into Pivotal Greenplum(GPDB) using COPY and also how to export data back to the file system.

Solution

-- Import data to GPDB from filesystem.

Prepare the binary file (we are using the "postgres" binary file):

gpadmin:Fullrack@mdw $ ls -l /tmp/postgres
-rwx------ 1 gpadmin gpadmin 37164697 May 16 18:01 /tmp/postgres
gpadmin:Fullrack@mdw $

Encode the binary file in to text:

gpadmin:Fullrack@mdw $ xxd -p /tmp/postgres | tr -d '\n' > /tmp/postgres.hex
gpadmin:Fullrack@mdw $
gpadmin:Fullrack@mdw $ ls -l /tmp/postgres /tmp/postgres.hex
-rwx------ 1 gpadmin gpadmin 37164697 May 16 18:01 /tmp/postgres
-rw------- 1 gpadmin gpadmin 74329394 May 16 18:15 /tmp/postgres.hex
gpadmin:Fullrack@mdw $

Create two tables (staging and target table):

test=# create table p1(a text); -- staging table
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
test=#
test=# create table p2(b text); -- target table
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'b' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

Copy the encoded file into the staging table. If the data does not have to be in binary representation within the database, then this table can be used as a storage table too. If the data in the database has to be in binary representation, continue with the next step.

test=# copy p1 from '/tmp/postgres.hex';
COPY 1

Insert the binary data into the target table:

test=# insert into p2 select pg_catalog.decode(a, 'hex') from p1;
INSERT 0 1

-- Export data from GPDB to filesystem.

Prepare the staging table:

test=# truncate table p1;
TRUNCATE TABLE

Insert encoded data into the staging table ("p1") from the table that contains the binary data ("p2"):

test=# insert into p1 select encode(b, 'hex') from p2;
INSERT 0 1

Copy out to a file on the filesystem:

test=# copy p1 to '/tmp/p1';
COPY 1

gpadmin:Fullrack@mdw $ ls -l /tmp/p1
-rw-r--r-- 1 gpadmin gpadmin 74329395 May 16 18:13 p1

Convert the file back to binary:

gpadmin:Fullrack@mdw $ xxd -r -p /tmp/p1 > /tmp/p2

Finally, check the original file with the exported file

gpadmin:Fullrack@mdw $ md5sum /tmp/postgres /tmp/p2
a5b03a61e2bf81c6d44af11e01efc6dc /tmp/postgres
a5b03a61e2bf81c6d44af11e01efc6dc /tmp/p2
gpadmin:Fullrack@mdw $

Comments

Powered by Zendesk