Pivotal Knowledge Base

Follow

gp_stats_missing - Explained

Goal

The below documents explains on how the view gp_stats_missing works with example.

Solution

gp_stats_missing ( located under gp_toolkit schema ) is a tool built by Pivotal Greenplum(GPDB) , By definition in the admin guide , its a tool to check the table that do not have statistics and therefore may require an ANALYZE be run on the table. It sounds simple right , but actually that is not the case. Lets take a example and explain the working of the view.

-- Case 1

  • The current value in the gp_stats_missing:
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
  • Create a table
    gpdb=# create table test ( a int );
    NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
    HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    CREATE TABLE
    Time: 11.534 ms
  • Now the gp_stats_missing shows the following:
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

Which explains the document explanation for gp_stats_missing, a view that explains/tells which table doesn't have stats.

  • So now lets analyze the table
    gpdb=# analyze test;
    ANALYZE
    Time: 37.658 ms

the analyze did not clear up the entry

   gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

The reason for the above is in the gp_toolkit.gp_stats_missing view definition. If you check the definition of the view , it tells you its basically checking if the table ( that is created above ) has value zero for either relpages / reltuples in the pg_class table , that is the reason you will find the smisize column under the gp_stats_missing view with value false "f" stating either the relpages or reltuples has zero value.

    SELECT aut.autnspname AS smischema, aut.autrelname AS smitable,
        CASE
            WHEN aut.autrelpages = 0 OR aut.autreltuples = 0::double precision THEN false
            ELSE true
        END AS smisize, attrs.attcnt AS smicols, COALESCE(bar.stacnt, 0::bigint) AS smirecs
    FROM gp_toolkit.__gp_user_tables aut
    JOIN ( SELECT pg_attribute.attrelid, count(*) AS attcnt
           FROM pg_attribute
          WHERE pg_attribute.attnum > 0
          GROUP BY pg_attribute.attrelid) attrs ON aut.autoid = attrs.attrelid
    LEFT JOIN ( SELECT pg_statistic.starelid, count(*) AS stacnt
      FROM pg_statistic
     GROUP BY pg_statistic.starelid) bar ON aut.autoid = bar.starelid
    WHERE (aut.autrelkind = 'r'::"char" AND (aut.autrelpages = 0 OR aut.autreltuples = 0::double precision)) OR (bar.stacnt IS NOT NULL AND attrs.attcnt > bar.stacnt);

so when checking the pg_class entry , you will find that the tables has value zero for reltuples / relpages

    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
    oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
    309901 | test    |         0 |        0
    (1 row)
 
Time: 1.158 ms
  • So moving further in the above example. So lets insert some data into the table
    gpdb=# insert into test values ( generate_series ( 1, 10000));
    INSERT 0 10000
    Time: 22.305 ms
    The table ‘pg_class’ now shows an update to relpages even though we have not analyzed: 
    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |         0 |        1
    (1 row)
     
    Time: 1.158 ms
  • This is due to gp_autostats_mode. This analyzes the zero records table for the first time when the first data is inserted.
    gpdb=# show  gp_autostats_mode;
     gp_autostats_mode
    -------------------
     ON_NO_STATS
    (1 row)
     
    Time: 0.260 ms
    gpdb=#
  • But it did not remove the data from gp_toolkit.gp_stats_missing
    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)
     
    Time: 15.469 ms

Again the reason for it lies in the definition of the gp_stats_missing view "aut.autrelpages = 0 OR aut.autreltuples = 0::double precision " which states both the relpages and reltuples should have a entry . So if we now run a analyze after the data insert

    gpdb=# analyze test;
    ANALYZE
    Time: 101.794 ms
  • Cross verifying in the pg_class
    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |     10000 |       16
    (1 row)

It is working well now.

    gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 20.075 ms
    gpdb=#

-- Case 2

Another case where the gp_stat_missing shows values for the table , is when you alter the table ( dropping / adding etc ) columns . For example:

  • Add a new column
alter table test ADD column (b int);
  • gp_stats_missing shows
    gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | t       |       2 |       1
    (1 row)
     
    Time: 14.265 ms
    Where :  
    smisize = "t" meaning relpages or reltuples are not zero , "f" means either relpages or reltuples are zero
    smicols = there are two columns in the table "test"
    smirecs = only one of the column in the table "test" has statistics information in the pg_statistic table.
  • Analyze the table
    gpdb=# analyze test;
    ANALYZE
    Time: 35.347 ms
    gpdb=#
  • gp_stat_missing now shows.
    gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 15.989 ms

Conclusion

  • gp_stats_missing tells you which table needs statistics/analyze if table with data have not been analyzed anytime in its lifecycle / or when the table has altered to add / drop column.
  • gp_stats_missing doesn't remove the entry from its view when the table has no rows / or when tuples / pages has been reduced to zero after deleting of data .
  • gp_stats_missing doesn't tell you if the table needs statistics after updates / delete / bulk loading of data , since it only checks for relpages / reltuples = 0.

Comments

Powered by Zendesk