New tables are created with Random Distribution


Pivotal Greenplum All versions


When creating a new table and when no distribution policy is mentioned, gpdb does not auto-pick the first column as its distribution column. Therefore, all new tables are distributed at random. 


This is happening because the GUC "gp_create_table_random_default_distribution" is turned ON on your cluster.


Turn the parameter gp_create_table_random_default_distribution to OFF to ensure that the table when created auto picks a column for distribution of data.

For example,

1. Creating a table with the parameter OFF

flightdata=# show gp_create_table_random_default_distribution;
(1 row)

flightdata=# create table p1 ( a int , b int );
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

flightdata=# \d p1
      Table "public.p1"
 Column |  Type   | Modifiers
 a      | integer |
 b      | integer |
Distributed by: (a)

2. Turning ON the parameter leads to any new table created with a random distribution.

flightdata=# set gp_create_table_random_default_distribution=on;

flightdata=# create table p2 ( a int , b int ); NOTICE: Using default RANDOM distribution since no distribution was specified. HINT: Consider including the 'DISTRIBUTED BY' clause to determine the distribution of rows. CREATE TABLE
flightdata=# \d p2 Table "public.p2" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | Distributed randomly


