Pivotal Knowledge Base

Follow

What are the different option to remove bloat from a table

Environment

Pivotal Greenplum (GPDB)

Purpose

The document describes what are the various option available to remove bloat from the table.

To understand more on the table bloat please refer to article and refer to the article to identify if your database tables are bloated.

Solution

Below are the various ways on how you can remove the table bloats.

1. VACUUM FULL

This the slowest method of the slot, when you execute the VACUUM FULL command, rows in the table are reshuffled and if there are very high amount of rows that leads to various reshuffling of the data leading to very unpredictable time.

The command to execute a VACUUM FULL on the bloated table is

VACUUM FULL <schema-name>.<table-name>;

Any bloated catalog table of the database (i.e all table under pg_catalog schema) can only use this method, thus removing highly bloating catalog tables can be very time consuming.

2. Redistribute

This is one the quickest method of the list, when you execute redistribute command, internally the database creates a new file and start to load the existing data and once done removes the old file.

Refer to the article on how to redistribute your table to remove the bloat.

3. CTAS

CTAS is also another quick method to remove bloat and also using this method helps to avoid the table lock (EXCLUSIVE LOCK) which the above two methods acquires to do the operation, thus allowing the table to be used by end users while maintenance is being performed on the main table.

The disadvantage involves it has many steps to perform the activity, the steps involves.

  • Obtain the DDL of the table using the steps mentioned in the article, using this helps you obtain all the sub objects like INDEX that are involved with the relation and also it provides that list of grants provided to the users of the table.
  • Once the DDL is obtained, replace the <bloated-table-name> to <new-table-name>  using the find/replace option with any editor of your choice and then execute the file on psql to create the object on the database.
  • Then follow the series of steps
INSERT INTO <schema-name>.<new-table-name> SELECT * FROM <schema-name>.<bloated-table-name>;
ALTER TABLE <schema-name>.<bloated-table-name> RENAME TO <old-table-name>;
ALTER TABLE <schema-name>.<new-table-name> RENAME TO <bloated-table-name>;
-- Once users confirm everything is good.
DROP TABLE <schema-name>.<bloated-table-name>;

4. Backup & Restore

This is another way to clear up the bloat, this involves backing up the table and then restoring them back , the different tool that can be used to achieve this are

  • gpcrondump / gpdbrestore
  • gp_dump / gp_restore
  • pg_dump / pg_restore
  • COPY .. TO .. / COPY .. FROM ..

Refer to the documentation for more information on these management utilities.

Or you can use a combination of backup utility and psql as shown in here if you are short of disk space.

NOTE:

  • To avoid running into bloat on your tables/relation, make sure you do frequent VACUUM on your database, to understand more of the VACUUM / VACUUM FULL check out here.
  • Catalog tables can only be unbloated using VACUUM FULL.
  • To remove the bloat on the index use the command
REINDEX INDEX <index-name>;
OR
REINDEX TABLE <bloated-table-name>;
  • Make sure you analyze the table after you have performed the above activity so that database has new stats and optimizer is aware of the changes when make planning to scan the data from the table.

Comments

Powered by Zendesk