Pivotal Knowledge Base

Follow

How to assign default values to a column during gpload when the input file has no data for the column

Environment

Product Version
Pivotal Greenplum (GPDB)  4.3.x
OS RHEL 6.x

Purpose

This article explains how to proceed in the following situation: when attempting to load data via gpload on a table that has default values for the columns, the columns do not get updated with the default values.

An example is shown here:

  • The table "test" has a column "c" which has the default value of "0" if no records are assigned to the column.
flightdata=# \d test 
     Table "public.test"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | text    | 
 b      | text    | 
 c      | integer | default 0
Distributed by: (a)
  • But when trying to load the data shown below (NOTE: The last row has NULL value in last column),
[gpadmin@gpdbsne tmp]$ cat gpload.data 
a|b|1
c|d|2
e|f|3
g|h|
  • via gpload,
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters
VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - TABLE: public.test
    - MODE: insert

[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:05:05|INFO|gpload session started 2016-03-09 12:05:05
2016-03-09 12:05:05|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:05:05|INFO|running time: 0.14 seconds
2016-03-09 12:05:06|INFO|rows Inserted          = 4
2016-03-09 12:05:06|INFO|rows Updated           = 0
2016-03-09 12:05:06|INFO|data formatting errors = 0
2016-03-09 12:05:06|INFO|gpload succeeded
  • the column is not populated with the default value.
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h |  
(4 rows)

In this article, we will describe how to assign a default value to a column that does not have values on the input file.

Procedure

  • Truncate the table (to remove the old data).
flightdata=# truncate table test ;
TRUNCATE TABLE
flightdata=# \q
  • Modify the YAML file (gpload control file) and use the "MAPPING" keyword, to assign values to the column if found.

Here we assign the column 'c' if found NULL , then assign "0" as the default.

[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters

VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - MAPPING :
        a : a
        b : b
        c : 'case coalesce(c,0) when 0::numeric then 0::numeric else c end'
    - TABLE: public.test
    - MODE: insert
  • Now, execute the gpload command.
[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:07:43|INFO|gpload session started 2016-03-09 12:07:43
2016-03-09 12:07:43|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:07:43|INFO|running time: 0.20 seconds
2016-03-09 12:07:43|INFO|rows Inserted          = 4
2016-03-09 12:07:43|INFO|rows Updated           = 0
2016-03-09 12:07:43|INFO|data formatting errors = 0
2016-03-09 12:07:43|INFO|gpload succeeded
[gpadmin@gpdbsne tmp]$ psql
psql (8.2.15)
Type "help" for help.
  • Now the data loaded would be of the correct format as shown below: 
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h | 0
(4 rows)

Comments

Powered by Zendesk