Pivotal Knowledge Base

Follow

How to use copy bulk insert with postgres JDBC

Environment

  • Pivotal HDB
  • Pivotal Greenplum

Purpose

Normally when inserting data into Pivotal HDB/Pivotal Greenplum via the JDBC the driver will breakup the data into single insert statements resulting slower performance. Here is a sample of log file showing JDBC executing a single insert statement

2014-11-19 06:03:25.200902 CST,"gpadmin","gpadmin",p641638,th-1497897120,"127.0.0.1","63057",2014-11-19 06:02:44 CST,166050,con3067,cmd1006,seg-1,,,x166050,sx1,"LOG","00000","Query plan size to dispatch: 6KB",,,,,,"insert into jdbc_load values('lksdfd', 'slkdfjds', 'klsdjfwo', 'sodifjds', 'fdlksjfew')",0,,"cdbdisp.c",4010,

You might not notice running single insert statments for 100 rows but when you reach thousands or millions then this will become painfully slow in a MPP environment. To speed up performance one can use JDBC class copymanager to execute a SQL copy instead. Instead of seeing thouasands of interst statements in the log you will only see one entry

2014-11-20 13:46:33.970231 PST,"gpadmin","gpadmin",p58809,th824858464,"127.0.0.1","40103",2014-11-20 13:46:32 PST,1015,con9,cmd1,seg-1,,,x1015,sx1,"ERROR","22P04","missing data for column ""b""",,,,,"COPY jdbc_load, line 1: ""lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew""","COPY jdbc_load from STDIN",0,,"copy.c",5187,

The COPY statement will yield faster table loads when using JDBC with Pivotal GPDB or HAWQ

Get the code

https://github.com/randomtask1155/JDBCBulkInsert

Build the code

git clone https://github.com/randomtask1155/JDBCBulkInsert.git
cd JDBCBulkInsert
mvn package

new jar file will be found here target/JDBCBulkInsert-1.0-jar-with-dependencies.jar

Run a test

  • Create a table in database
    gpadmin=# create table jdbc_load (a text, b text, c text, d text, e text );
    
  • Create a TAB delimited file /tmp/test.txt
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    
  • Load in /tmp/test.txt into table jdbc_load
    [gpadmin@hdm1 ~]$ /usr/java/default/bin/java -classpath JDBCBulkInsert-1.0-jar-with-dependencies.jar io.pivotal.support.JDBCBulkInsert "jdbc:postgresql://localhost:5432/gpadmin" gpadmin gpadmin /tmp/test.txt jdbc_load;
    ~# Loading Driver org.postgresql.Driver
    ~# connecting to database with url jdbc:postgresql://localhost:5432/gpadmin
    ~# reading in file /tmp/test.txt
    ~# COPY operation completed successfully
    
  • Verify it worked!
    gpadmin=# select * from jdbc_load ;
       a    |    b     |    c     |    d     |     e
    --------+----------+----------+----------+-----------
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
    (15 rows)
    

Comments

Powered by Zendesk