Pivotal Knowledge Base

Follow

How to identify Append-Optimized tables that can be compacted (VACUUMed) in GPDB

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3+

Purpose

The purpose of this document is to explain how to identify bloat within append optimized/append optimized column oriented (AO/AOCO) tables that require compaction.

IMPORTANT NOTE:  This document only applies to AO/AOCO tables.  For heap tables, please refer to article

Cause 

Append-optimized tables were first introduced in GPDB v.4.3, adding the ability to UPDATE and DELETE data from Append Only tables.   

To accommodate the ability to UPDATE and DELETE data, when an append-optimized table is created, an additional visibility bitmap (visimap) table is created with a B-tree index.  This visimap table contains information on which rows are visible in the append optimized table, including details on the number of records on each segment, how many tuples or rows are visible, and how many are hidden (due to an update or delete).

Since the data is only marked "hidden" during the UPDATE or DELETE operations, these rows still take up disk space and the table will require regular maintenance to reclaim the space by compacting the append-optimized table. 

Procedure

The first step is to identify which tables could benefit from compaction.  The gp_toolkit schema provides the function __gp_aovisimap_compaction_info(oid) to display the number of tuples in the on disk data files as compared to the visibility bitmap table and details whether compaction will assist in reclaiming space.

The function utilizes the server configuration parameter gp_appendonly_compaction_threshold to know the threshold in which to flag compaction_possible as true:
http://gpdb.docs.pivotal.io/4370/guc_config-gp_appendonly_compaction_threshhold.html#gp_appendonly_compaction_threshhold

The default is 10 or 10%.

After identifying tables which could be compacted, a VACUUM can be run against them to reclaim this space.

The following demonstrates the steps involved in this process.

1.  Query for the oid of the append-only table if unknown is shown here:

gpadmin=# SELECT oid FROM pg_class WHERE relname='mytable';
oid
-------
47372
(1 row)
2.  Take the oid and query the __gp_aovisimap_compaction_info(oid) function passing that oid.  
 
The command will report back what the gp_append_only_compaction_threshold parameter is set to.
 
The field compaction_possible is equal to TRUE since the percent_hidden field (hidden_tupcount/total_tupcount) is equal to 33.33%.  Or 33.33% of the records are marked hidden in our visimap and can be compacted out of our table.
gpadmin=# SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(47372);
NOTICE:  gp_appendonly_compaction_threshold = 10
content | datafile | compaction_possible | hidden_tupcount | total_tupcount | percent_hidden
---------+----------+---------------------+-----------------+----------------+----------------
      2 |        1 | t                   |           33333 |          99999 |          33.33
(1 row)
3.  Since the table can benefit from compaction, a VACUUM is run against the table as shown below:
gpadmin=# VACUUM mytable;
4.  (Optional) Re-run __gp_aovisimap_compaction_info once again to confirm compaction has occurred.  As expected, compaction_possible is now marked FALSE. This is because there are 0 hidden_tupcount, making the percent_hidden 0.00. This is below the gp_append_compaction_threshold configuration setting:
 
gpadmin=# SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(47372);
NOTICE:  gp_appendonly_compaction_threshold = 10
 content | datafile | compaction_possible | hidden_tupcount | total_tupcount | percent_hidden
---------+----------+---------------------+-----------------+----------------+----------------
       2 |        1 | f                   |               0 |          66666 |           0.00

NOTE:  Please refer to the latest GPDB Reference guide for more details

Additional Information

Comments

Powered by Zendesk