Pivotal Knowledge Base

Follow

Different Options to Remove Bloat from a Table

Environment

Pivotal Greenplum Database (GPDB) all versions

Purpose

The document describes what are the various option available to remove bloat from the table.

To understand more about the table bloat please refer to the article and refer to the article to identify if your database tables are bloated.

Procedure

Below are the various ways on how you can remove the table bloats.

1. VACUUM FULL

This the slowest method of the slot, when you execute the VACUUM FULL command, rows in the table are reshuffled and if there are the very high amount of rows that lead to the various reshuffling of the data leading to very unpredictable time.

The command to execute a VACUUM FULL on the bloated table is

VACUUM FULL <schema-name>.<table-name>;

Any bloated catalog table of the database (i.e all table under pg_catalog schema) can only use this method, thus removing highly bloating catalog tables can be very time-consuming.

2. Redistribute

This is one the quickest method of the list, when you execute redistribute command, internally the database creates a new file and start to load the existing data and once done removes the old file.

Refer to the article on how to redistribute your table to remove the bloat.

3. CTAS

CTAS is also another quick method to remove bloat and also using this method helps to avoid the table lock (EXCLUSIVE LOCK) which the above two methods acquire to do the operation, thus allowing the table to be used by end users while maintenance is being performed on the main table.

The disadvantage involves it has many steps to perform the activity, the steps involved.

  • Obtain the DDL of the table using the steps mentioned in the article, using this helps you obtain all the sub-objects like INDEX that are involved with the relation and also it provides that list of grants provided to the users of the table.
  • Once the DDL is obtained, replace the <bloated-table-name> to <new-table-name>  using the find/replace option with any editor of your choice and then execute the file on psql to create the object on the database.
  • Then follow the series of steps
INSERT INTO <schema-name>.<new-table-name> SELECT * FROM <schema-name>.<bloated-table-name>;
ALTER TABLE <schema-name>.<bloated-table-name> RENAME TO <old-table-name>;
ALTER TABLE <schema-name>.<new-table-name> RENAME TO <bloated-table-name>;
-- Once users confirm everything is good.
DROP TABLE <schema-name>.<bloated-table-name>;

4. Backup & Restore

There is another way to clear up the bloat. This involves backing up the table and then restoring them back. The different tools that can be used to achieve this are:

  • gpcrondump / gpdbrestore
  • gp_dump / gp_restore
  • pg_dump / pg_restore
  • COPY .. TO .. / COPY .. FROM ..

Refer to the documentation for more information on these management utilities.

Or you can use a combination of backup utility and psql as shown in here if you are short of disk space.

Additional Information

  • To avoid running into bloat on your tables/relation, make sure you do frequent VACUUM on your database, to understand more of the VACUUM / VACUUM FULL check out here.
  • Catalog tables can only be unbloated using VACUUM FULL.
  • To remove the bloat from the index using the command
REINDEX INDEX <index-name>;
OR
REINDEX TABLE <bloated-table-name>;
  • Make sure you analyze the table after you have performed the above activity so that database has new stats and optimizer is aware of the changes when making planning to scan the data from the table.

Comments

Powered by Zendesk