**Environment**

Product | Version |

Pivotal Greenplum (GPDB) | All versions |

**Purpose**

There are three different reasons a table needs to be 'VACUUM ANALYZE'ed.

- Reclaiming space from dead rows in order to prevent from bloating (VACUUM)
- Decrease transaction age for tables with high age (VACUUM)
- Collect statistics so queries including this table have optimal execution plan (ANALYZE)

This article describes the above reasons in detail.

**Resolution**

**Reclaiming space from dead rows**

- gp_bloat_diag

View "gp_toolkit.gp_bloat_diag" Column | Type | Modifiers -------------+---------+----------- bdirelid | oid | - OID bdinspname | name | - Schema name bdirelname | name | - Table name bdirelpages | integer | - Number of table pages bdiexppages | numeric | - Number of expected pages bdidiag | text | - Diagnostic: "no bloat"/"moderate bloat"/"significant bloat"

lpetrov=# select * from gp_toolkit.gp_bloat_diag; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------------------------------------- 353016 | public | t1 | 978 | 1 | significant amount of bloat suspected (1 row)

- gp_bloat_expected_pages

View "gp_toolkit.gp_bloat_expected_pages" Column | Type | Modifiers -------------+---------+----------- btdrelid | oid | - OID btdrelpages | integer | - Number of table pages btdexppages | numeric | - Number of expected pages

Example:

lpetrov=# select * from gp_toolkit.gp_bloat_expected_pages where btdrelid = 't1'::regclass; btdrelid | btdrelpages | btdexppages ----------+-------------+------------- 353016 | 978 | 1 (1 row)

gp_bloat_expected_pages show all table data, while gp_bloat_diag will show data only about the tables with suspected bloat (moderate or significant).

**Lower transaction age for tables with high age**

This is a two-step process:

- Find databases/segments with high age

SELECT -1, datname, age(datfrozenxid) FROM pg_database UNION ALL SELECT gp_segment_id, datname, age(datfrozenxid) FROM gp_dist_random('pg_database') ORDER BY 3 DESC

- Find tables within these databases and segments that need to be "vacuumed"

SELECT coalesce(n.nspname, ''), relname, relkind, relstorage, age(relfrozenxid) FROM pg_class c LEFT JOIN pg_namespace n ON c.relnamespace = n.oid WHERE relkind = 'r' AND relstorage NOT IN ('x') ORDER BY 5 DESC

**Note:** You can use a "where" condition instead of using LIMIT to show only ages greater than some constant.

**Collect statistics, so queries including this table have optimal execution plan.**

There is no way to find out tables with stale statistics currently aside from examining EXPLAIN ANALYZE output and comparing the estimated vs scanned rows for table scans.

There are 2 ways to get an approximate value through:

- gp_toolkit.gp_stats_missing - view to show tables without statistics in the catalog
- pg_stat_last_operation - to find out when the last ANALYZE happened. If significant modifications were done to the table in the meantime, it probably needs to be analyzed.

## Comments