Pivotal Knowledge Base

Follow

How to Assign Default Values to a Column during gpload when the Input File has no data for the Column

Environment

  • Pivotal Greenplum 4.3.x
  • Operating System- Red Hat Enterprise Linux 6.x

Purpose

This article explains how to proceed in the following situation: when attempting to load data via gpload on a table that has default values for the columns, the columns do not get updated with the default values.

An example is shown here:

  • The table "test" has a column "c" which has the default value of "0" if no records are assigned to the column.
flightdata=# \d test 
     Table "public.test"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | text    | 
 b      | text    | 
 c      | integer | default 0
Distributed by: (a)
  • But when trying to load the data shown below (NOTE: The last row has NULL value in last column),
[gpadmin@gpdbsne tmp]$ cat gpload.data 
a|b|1
c|d|2
e|f|3
g|h|
  • via gpload,
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters
VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - TABLE: public.test
    - MODE: insert

[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:05:05|INFO|gpload session started 2016-03-09 12:05:05
2016-03-09 12:05:05|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:05:05|INFO|running time: 0.14 seconds
2016-03-09 12:05:06|INFO|rows Inserted          = 4
2016-03-09 12:05:06|INFO|rows Updated           = 0
2016-03-09 12:05:06|INFO|data formatting errors = 0
2016-03-09 12:05:06|INFO|gpload succeeded
  • the column is not populated with the default value.
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h |  
(4 rows)

In this article, we will describe how to assign a default value to a column that does not have values on the input file.

Procedure

  • Truncate the table (to remove the old data).
flightdata=# truncate table test ;
TRUNCATE TABLE
flightdata=# \q
  • Modify the YAML file (gpload control file) and use the "MAPPING" keyword, to assign values to the column if found.

Here we assign the column 'c' if found NULL , then assign "0" as the default.

[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters

VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - MAPPING :
        a : a
        b : b
        c : 'case coalesce(c,0) when 0::numeric then 0::numeric else c end'
    - TABLE: public.test
    - MODE: insert
  • Now, execute the gpload command.
[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:07:43|INFO|gpload session started 2016-03-09 12:07:43
2016-03-09 12:07:43|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:07:43|INFO|running time: 0.20 seconds
2016-03-09 12:07:43|INFO|rows Inserted          = 4
2016-03-09 12:07:43|INFO|rows Updated           = 0
2016-03-09 12:07:43|INFO|data formatting errors = 0
2016-03-09 12:07:43|INFO|gpload succeeded
[gpadmin@gpdbsne tmp]$ psql
psql (8.2.15)
Type "help" for help.
  • Now the data loaded would be of the correct format as shown below: 
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h | 0
(4 rows)

 

Comments

Powered by Zendesk