Pivotal Knowledge Base

Follow

Vacuum and Vacuum Full - Explained

Environment

Pivotal Greenplum Database (GPDB) all versions

Introduction

The below documents explains how vacuum and vacuum full works with an example.

Description

Vacuum:

When Vacuum is done, the dead space is reclaimed and made available for re-use by the same object/table etc

Vacuum Analyze:

Its performs a "Vacuum" and then an "Analyze" for each selected table.

Vacuum Full:

The Full keyword along with Vacuum writes the entire content of the table into new disk file and release the wasted space back to OS, so this cause a table-level lock on the table and it's much slower, should be avoided on a high load system.

How does Vacuum work

When you do a Vacuum, the Greenplum ( aka postgres ) update the FSM ( i.e free space map ) and keep tracks of the free space, and subsequent updates on that relation will track the FSM through the internal table and uses them when needed, when you issue a Vacuum Verbose , you might see the below information.

INFO:  "vacuum_test": removed 1333496 row versions in 2127 pages  (seg17 sdw3:11039 pid=18570)
INFO:  "vacuum_test": removed 1333104 row versions in 2126 pages  (seg2 sdw1:11036 pid=10693)
INFO:  "vacuum_test": found 1333496 removable, 583324 nonremovable row versions in 3048 pages  (seg17 sdw3:11039 pid=18570)
DETAIL:  0 dead row versions cannot be removed yet.
.........
..........
INFO:  "vacuum_test": removed 1333212 row versions in 2123 pages  (seg1 sdw1:11035 pid=10687)
INFO:  "vacuum_test": found 1333212 removable, 583639 nonremovable row versions in 3048 pages  (seg1 sdw1:11035 pid=10687)
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
2124 pages contain useful free space.
0 pages are entirely empty.

You can tune the parameter "max_fsm_relations" ( default : 1000) and "max_fsm_pages" (default : 200000 ) to report on the master log when the FSM reaches a specific level .

Note: By design "max_fsm_pages" must be exceed 16 times "max_fsm_relations" , else the database wont start

Example

So, let us put the Vacuum and Vacuum Full definition into practice and see how different options of vacuum works.

Vacuum

  • Create a test table
gpadmin=# create table vacuum_test ( a int , b text ) ;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 359.060 ms
gpadmin=#
  • The value of the tuples and pages on the table
gpadmin=#
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages | reltuples
-------------+----------+-----------
 vacuum_test |        0 |         0
(1 row)
Time: 2.438 ms
gpadmin=#
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 25.589 ms
gpadmin=#
  • Insert test data
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 6245.503 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4697.305 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4319.884 ms
gpadmin=#
gpadmin=#
  • so now the current usage of the table ( per tuples/pages)
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    15912 | 1.00086e+07
(1 row)
Time: 2.585 ms
  • Analyzing the table to know the correct data
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1789.892 ms
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    47710 | 3.00096e+07
(1 row)
Time: 2.687 ms
gpadmin=#
  • Cleaning up some rows
gpadmin=# delete from vacuum_test  where a<800000;
DELETE 2399997
Time: 967.301 ms
                                                  ^
gpadmin=# delete from vacuum_test  where a<8000000;
DELETE 21600000
Time: 2184.398 ms
  • Analyzing the table
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 26.318 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1881.765 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7763 | moderate amount of bloat suspected
(1 row)
Time: 19.336 ms
  • vacuum the table
gpadmin=# vacuum vacuum_test ;
VACUUM
Time: 132.314 ms

You will find that the vacuum has not released any space from the table , the table still uses the same OS blocks as done previously.

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 18.026 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.606 ms
gpadmin=#
  • Analyzing the table also report the same value as above and the "gp_bloat_diag" also reporting the table has bloat.
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1176.675 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359758 | vacuum_test |    47710 | 5.80361e+06
(1 row)
Time: 2.638 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7793 | moderate amount of bloat suspected
(1 row)
Time: 17.707 ms
gpadmin=#
gpadmin=#
  • Lets add data to the already vacuum table
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 5171.964 ms
  • You can witness the insert used up already freed space from the table and didn't consume any extra pages from the OS.
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 16.548 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.509 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1734.669 ms

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 16.291 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359884 | vacuum_test |    47710 | 1.62695e+07
(1 row)
Time: 1.612 ms
gpadmin=#

Vacuum Full

When you do a vacuum full the data is now re-organized and the unused spaced are given back to the OS.

gpadmin=# vacuum full vacuum_test ;
NOTICE:  'VACUUM FULL' is not safe for large tables and has been known to yield unpredictable runtimes.
HINT:  Use 'VACUUM' instead.
VACUUM
Time: 1452.043 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 17.503 ms
 
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |     9552 |     6e+06
(1 row)
Time: 1.452 ms
gpadmin=#

Note: Running vacuum full on a very large table can lead to unexpected runtime and during this run the table will be on exclusive lock the whole time, the alternative faster way to reorg the table would be to redistribute the table using the command as indicated here or CTAS or backup/restore.

Comments

Powered by Zendesk