Pivotal Knowledge Base

Follow

How to enforce index usage in Pivotal Greenplum database

Environment

Product Version
Pivotal Greenplum (GPDB) All versions

 

Purpose

Indexes are created on the table, but Greenplum seems to skip them and does not scan the data via the Index.

In this article, we will demonstrate how to force the index scan on the table.

 

Procedure

The optimizer of Postgres or Greenplum is cost-based and selects the execution plan with the lowest cost. In Greenplum, the cost for one scattered IO access is measured by the random_page_cost parameter, which is 100 by default. As compared to 1 in Postgres, it is quite large; it is set in this way because we want to choose seq scan as frequently as possible in most cases.

Note: Applications built on Greenplum are most likely to be data warehouse applications. Keep in meep in mind that the cost-based optimizer is being used, so random_page_cost should not be changed to a smaller value in most cases.

  • Create a table.
gpadmin=# create table t as select generate_series x from generate_series(1,10000);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'x' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 10000
  • Create an index at the top of it.
gpadmin=# create index f_idx on t( lower(x));
CREATE INDEX

-- Explain of the query shows a sequential scan on the table

gpadmin=# explain select * from t where lower(x) = '1';                                                                                   
                                  QUERY PLAN                                 
------------------------------------------------------------------------------
 Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..191.00 rows=2 width=4)
   ->  Seq Scan on t  (cost=0.00..191.00 rows=2 width=4)
         Filter: lower(x::text) = '1'::text
(3 rows)
  • Altering the random_page_cost parameter, now shows the Index scan being used.
gpadmin=# set random_page_cost to 1;
SET
gpadmin=# explain select * from t where lower(x) = '1';
                                 QUERY PLAN                                
----------------------------------------------------------------------------
 Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..9.43 rows=2 width=4)
   ->  Index Scan using f_idx on t  (cost=0.00..9.43 rows=2 width=4)
         Index Cond: lower(x::text) = '1'::text
 Settings:  random_page_cost=1
(4 rows)


This procedure should achieve the required results. 

 

Comments

Powered by Zendesk