In some cases, due to the underlying storage architecture, Pivotal Greenplum "heap" tables are susceptible to bloat. Bloat can affect table scanning performance and therefore user query performance.
The following will be covered in this article:
- What is table bloat?
- What causes table bloat?
- How to identify table bloat?
- How to eliminate table bloat?
Table bloat is accumulation of free space used by old data rows within the table data files. This space has been previously used by data rows which are deleted and not accessible any more. Failure to do table maintenance in order to allow reuse of this space causes table data file to grow bigger and therefore scans of the table take longer.
The Pivotal Greenplum storage implementation (known as MVCC - MultiVersion Concurrency Control) is borrowed from Postgres. According to this implementation:
There are no in-place updates (updates are implemented by delete + insert) The row stays in the data file until space is marked as "free" space via the VACUUM command after no more transactions can potentially access it.
Once VACUUM registers the deleted row space as "free space", the space can be reused by future inserts and updates. After rows are deleted (and no transactions can access them) and before VACUUM is executed, this space is not marked as free for reuse and is effectively "dead space".
Use the gp_toolkit administrative schema:
gp_toolkit.gp_bloat_diag - this view shows tables with moderate and significant amount of bloat
- bdirelid - Object ID of the table (pg_class.oid)
- bdinspname - table schema name
- bdirelname - table name
- bdirelpages - number of pages currently in table data files
- bdiexppages - number of pages expected according to current statistics
- bdidiag - diagnosis of bloat (ratio from 1 to 3 -> no bloat, ratio from 4 to 10 -> moderate bloat, ratio > 10 -> significant bloat)
In this example the table "t1" is severely bloated (the calculated data size for data currently in table is 1 page, but table consists of 97 pages).
gpadmin=# select * from gp_toolkit.gp_bloat_diag; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------------------------------------- 21488 | public | t1 | 97 | 1 | significant amount of bloat suspected (1 row)
gp_toolkit.gp_bloat_expected_pages - this view shows the page data (current pages/expected pages) for all tables in the database Columns:
- btdrelid - Object ID of the table (pg_class.oid)
- btdrelpages - number of pages currently in table data files
- btdexppages - number of pages expected according to current statistics
Example: In this example the tables shown all have calculated data size of 1 page and actual data file size 1 page. No bloat is detected.
gpadmin=# select * from gp_toolkit.gp_bloat_expected_pages limit 5; btdrelid | btdrelpages | btdexppages ----------+-------------+------------- 10789 | 1 | 1 10794 | 1 | 1 10799 | 1 | 1 5004 | 1 | 1 7175 | 1 | 1 (5 rows)
- These views depend on table statistics to calculate the amount of bloat, therefore it is crucial that table statistics are up to date!
- Pivotal Greenplum database system tables are "heap" tables. Therefore they are susceptible to bloat. As system tables are very important for the database performance, it is recommended that they are vacuumed regularly. Recommended intervals are from one week for system with little object changes (create/drop/alter table) to one day or less for systems with lots of object changes.
The VACUUM command is used to mark the deleted rows in table data files as "free space" available for reuse and therefore eliminate bloat. It is normal and healthy for a table with lots of UPDATE/DELETE/INSERT workload to have certain small amount of deleted rows/free space, which will be reused as new data comes in. The regular use of VACUUM ensures that deleted rows are reused as free space as soon as possible.
In certain cases, when VACUUM has not been run for a long time and a huge amount of deleted rows (dead rows) have accumulated in the data files, it comes to situation where the real table rows are just a tiny amount of the table data files (in other words - table is significantly bloated). In these cases in order to eliminate the extra reserved space (if it will never be reused anyway), the VACUUM FULL command can be executed. This command compacts the table data by moving it at the front of the data file and truncates the unused space on the tail.
VACUUM FULL compacts rows one by one and therefore is slow for big tables and also takes exclusive lock on the table.
VACUUM FULL execution is recommended in maintenance window and with careful consideration of the run-time and effects.
VACUUM FULL command should not be terminated by user once started.
Better alternative to VACUUM FULL for user tables is to redistribute the table (cannot be performed on system tables). This effectively rebuilds the table getting rid of the bloat in the process.
The steps are:
- Write down the current table distribution columns
- ALTER TABLE SET with (REORGANIZE=false) DISTRIBUTED randomly; -- only "marks" the table, but does not move rows. Finishes instantly.
- ALTER TABLE SET with (REORGANIZE=true) DISTRIBUTED BY (<column name>); -- rewrites the data files. As the distribution actually never changes on data file level, rewrite happens locally without sending rows over network.
Refer to the article for more option to remove database bloat
- VACUUM FULL and the better alternative with table redistribution are recommended only in extreme cases.
- VACUUM is recommended to be run regularly on tables with INSERT/UPDATE/DELETE workload and on system tables.
- Check for bloat regularly in the system tables and user tables using gp_toolkit views.
- Regularly VACUUM system tables (according to the create/drop/alter object frequency).
- Identify user tables with lots of INSERT/UPDATE/DELETE activity and VACUUM them in low activity or no activity time windows according to bloat accumulation.