Pivotal Knowledge Base

Follow

gp_stats_missing - Explained

Environment

Pivotal Greenplum Database (GPDB) all versions

Introduction

This document explains how the view gp_stats_missing works with an example.

Description

gp_stats_missing ( located under gp_toolkit schema ) is a tool built by Pivotal Greenplum(GPDB) , By definition in the admin guide , its a tool to check the table that do not have statistics and therefore may require an ANALYZE be run on the table. It sounds right but actually, that is not the case. Lets take a example and explain the working of the view.

Case 1

  • The current value in the gp_stats_missing:
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
  • Create a table
    gpdb=# create table test ( a int );
    NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
    HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    CREATE TABLE
    Time: 11.534 ms
  • Now the gp_stats_missing shows the following:
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

Which explains the document explanation for gp_stats_missing, a view that explains/tells which table doesn't have stats.

  • Now, let's analyze the table
    gpdb=# analyze test;
    ANALYZE
    Time: 37.658 ms

the analyze did not clear up the entry

   gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

The reason for the above is in the gp_toolkit.gp_stats_missing view definition. If you check the definition of the view, it tells you its basically checking if the table ( that is created above ) has value zero for either relpages/reltuples in the pg_class table , that is the reason you will find the smisize column under the gp_stats_missing view with value false "f" stating either the relpages or reltuples has zero value.

    SELECT aut.autnspname AS smischema, aut.autrelname AS smitable,
        CASE
            WHEN aut.autrelpages = 0 OR aut.autreltuples = 0::double precision THEN false
            ELSE true
        END AS smisize, attrs.attcnt AS smicols, COALESCE(bar.stacnt, 0::bigint) AS smirecs
    FROM gp_toolkit.__gp_user_tables aut
    JOIN ( SELECT pg_attribute.attrelid, count(*) AS attcnt
           FROM pg_attribute
          WHERE pg_attribute.attnum > 0
          GROUP BY pg_attribute.attrelid) attrs ON aut.autoid = attrs.attrelid
    LEFT JOIN ( SELECT pg_statistic.starelid, count(*) AS stacnt
      FROM pg_statistic
     GROUP BY pg_statistic.starelid) bar ON aut.autoid = bar.starelid
    WHERE (aut.autrelkind = 'r'::"char" AND (aut.autrelpages = 0 OR aut.autreltuples = 0::double precision)) OR (bar.stacnt IS NOT NULL AND attrs.attcnt > bar.stacnt);

so when checking the pg_class entry, you will find that the tables have value zero for reltuples/relpages

    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
    oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
    309901 | test    |         0 |        0
    (1 row)
 
Time: 1.158 ms
  • So moving further in the above example. So, let's insert some data into the table
    gpdb=# insert into test values ( generate_series ( 1, 10000));
    INSERT 0 10000
    Time: 22.305 ms
    The table ‘pg_class’ now shows an update to relpages even though we have not analyzed: 
    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |         0 |        1
    (1 row)
     
    Time: 1.158 ms
  • This is due to gp_autostats_mode. This analyzes the zero records table for the first time when the first data is inserted.
    gpdb=# show  gp_autostats_mode;
     gp_autostats_mode
    -------------------
     ON_NO_STATS
    (1 row)
     
    Time: 0.260 ms
    gpdb=#
  • But it did not remove the data from gp_toolkit.gp_stats_missing
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)
     
    Time: 15.469 ms

Again the reason for it lies in the definition of the gp_stats_missing view "aut.autrelpages = 0 OR aut.autreltuples = 0::double precision " which states both the relpages and reltuples should have an entry. So if we now run a analyze the data insert

    gpdb=# analyze test;
    ANALYZE
    Time: 101.794 ms
  • Cross verifying in the pg_class
    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |     10000 |       16
    (1 row)

It is working well now.

    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 20.075 ms
    gpdb=#

Case 2

Another case where the gp_stat_missing shows values for the table is when you alter the table ( dropping / adding etc ) columns. For example:

  • Add a new column
alter table test ADD column (b int);
  • gp_stats_missing shows
    gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | t       |       2 |       1
    (1 row)
     
    Time: 14.265 ms
    Where :  
    smisize = "t" meaning relpages or reltuples are not zero , "f" means either relpages or reltuples are zero
    smicols = there are two columns in the table "test"
    smirecs = only one of the column in the table "test" has statistics information in the pg_statistic table.
  • Analyze the table
    gpdb=# analyze test;
    ANALYZE
    Time: 35.347 ms
    gpdb=#
  • gp_stat_missing now shows.
    gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 15.989 ms

Additional Information

  • gp_stats_missing tells you which table needs statistics/analyze if the table with data have not been analyzed anytime in its lifecycle / or when the table has altered to add or drop the column.
  • gp_stats_missing doesn't remove the entry from its view when the table has no rows / or when tuples / pages has been reduced to zero after deleting of data .
  • gp_stats_missing doesn't tell you if the table needs statistics after updates/ delete/bulk loading of data , since it only checks for relpages / reltuples = 0.

Comments

Powered by Zendesk