Pivotal Knowledge Base

Follow

Identify tables that need "VACUUM ANALYZE"

Environment

Product Version
Pivotal Greenplum (GPDB) All versions

Purpose

There are three different reasons a table needs to be 'VACUUM ANALYZE'ed.

  • Reclaiming space from dead rows in order to prevent from bloating (VACUUM)
  • Decrease transaction age for tables with high age (VACUUM)
  • Collect statistics so queries including this table have optimal execution plan (ANALYZE)

This article describes the above reasons in detail.

Resolution

Reclaiming space from dead rows
 
There are a couple of views in the "gp_toolkit" administrative schema that shows the potential bloat in a table. These views compare the actual size of a table on the disk with the 'expected' size calculated from statistics. Therefore, ensure to have your tables "analyzed" before interpreting the results from these views.
  • gp_bloat_diag
  View "gp_toolkit.gp_bloat_diag"
   Column    |  Type   | Modifiers 
-------------+---------+-----------
 bdirelid    | oid     |              - OID
 bdinspname  | name    |              - Schema name
 bdirelname  | name    |              - Table name
 bdirelpages | integer |              - Number of table pages
 bdiexppages | numeric |              - Number of expected pages
 bdidiag     | text    |              - Diagnostic: "no bloat"/"moderate bloat"/"significant bloat"

Example: 
lpetrov=# select * from gp_toolkit.gp_bloat_diag;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |                bdidiag                
----------+------------+------------+-------------+-------------+---------------------------------------
   353016 | public     | t1         |         978 |           1 | significant amount of bloat suspected
(1 row)
  • gp_bloat_expected_pages
lpetrov=# \d gp_toolkit.gp_bloat_expected_pages 
View "gp_toolkit.gp_bloat_expected_pages"
   Column    |  Type   | Modifiers 
-------------+---------+-----------
 btdrelid    | oid     |             - OID
 btdrelpages | integer |             - Number of table pages
 btdexppages | numeric |             - Number of expected pages

Example:

lpetrov=# select * from gp_toolkit.gp_bloat_expected_pages where btdrelid = 't1'::regclass;
 btdrelid | btdrelpages | btdexppages 
----------+-------------+-------------
   353016 |         978 |           1
(1 row)

gp_bloat_expected_pages show all table data, while gp_bloat_diag will show data only about the tables with suspected bloat (moderate or significant).

Lower transaction age for tables with high age

This is a two-step process:

  • Find databases/segments with high age
SELECT 
    -1, datname, age(datfrozenxid) 
FROM pg_database
UNION ALL
SELECT 
    gp_segment_id, datname, age(datfrozenxid)
FROM gp_dist_random('pg_database')
ORDER BY 3 DESC
  • Find tables within these databases and segments that need to be "vacuumed"
SELECT 
    coalesce(n.nspname, ''), 
    relname, 
    relkind, 
    relstorage, 
    age(relfrozenxid)
FROM 
    pg_class c 
    LEFT JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE 
    relkind = 'r' AND relstorage NOT IN ('x', 'a', 'c')
ORDER BY 5 DESC

Note: You can use a "where" condition instead of using LIMIT to show only ages greater than some constant.

Collect statistics, so queries including this table have optimal execution plan.

There is no way to find out tables with stale statistics currently aside from examining EXPLAIN ANALYZE output and comparing the estimated vs scanned rows for table scans.

There are 2 ways to get an approximate value through:

  • gp_toolkit.gp_stats_missing - view to show tables without statistics in the catalog
  • pg_stat_last_operation - to find out when the last ANALYZE happened. If significant modifications were done to the table in the meantime, it probably needs to be analyzed.

Comments

Powered by Zendesk