When running a query with two tables, it fails to run and gives the error message below.
gpadmin=# select count(*) from (select * from t1_dis_ext t1 full join t2_dis_ext t2 on (t1.a = t2.num)) t;
ERROR: All external tables in one query must have the same bucket number! (cdbdatalocality.c:793)
This is an expected symptom that occurs when two tables inside a single query have different bucket numbers. Bucket number for a table is set when the table is created using the GUC of default_hash_table_bucket_number. By default, the GUC is determined as (#segment nodes * 6) and the user can override the value.
There can be a situation where two tables in a query can be created at different times. One with N segments, and the other with different number of segments due to cluster expansion or data migration. If this happens, two tables can have different bucket numbers and thus can cause this error at a later time.
Recreate one of both of the external tables using required/expected bucket number by changing the GUC of default_hash_table_bucket_number.
You can use the following query to identify current the bucket number of an external table:
gpadmin=# select oid, bucketnum from gp_distribution_policy,pg_class where relname='t1_dis_ext' and localoid=pg_class.oid;oid | bucketnum
30818 | 12