Pivotal Knowledge Base

Follow

Vacuum / Vacuum Full - Explained

Goal

The below documents explains on how vacuum and vacuum full works with example.

Definition

Vacuum :

When Vacuum is done , the dead space is reclaimed and made available for re-use by the same object / table etc

Vacuum Analyze :

Its performs a VACUUM and then an ANALYZE for each selected table.

Vacuum Full :

The Full keyword along with Vacuum writes the entire content of the table into new disk file and release the wasted space back to OS , so this cause a table level lock on the table and its much slower , should be avoided on a high load system.

How does Vacuum work

When you do a Vacuum , the greenplum ( aka postgres ) update the FSM ( i.e free space map ) and keep tracks of the free space , and subsequent updates on that relation will track the FSM through the internal table and uses them when needed, when you issue a Vacuum Verbose , you might see the below information.

INFO:  "vacuum_test": removed 1333496 row versions in 2127 pages  (seg17 sdw3:11039 pid=18570)
INFO:  "vacuum_test": removed 1333104 row versions in 2126 pages  (seg2 sdw1:11036 pid=10693)
INFO:  "vacuum_test": found 1333496 removable, 583324 nonremovable row versions in 3048 pages  (seg17 sdw3:11039 pid=18570)
DETAIL:  0 dead row versions cannot be removed yet.
.........
..........
INFO:  "vacuum_test": removed 1333212 row versions in 2123 pages  (seg1 sdw1:11035 pid=10687)
INFO:  "vacuum_test": found 1333212 removable, 583639 nonremovable row versions in 3048 pages  (seg1 sdw1:11035 pid=10687)
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
2124 pages contain useful free space.
0 pages are entirely empty.

You can tune the parameter "max_fsm_relations" ( default : 1000) and "max_fsm_pages" (default : 200000 ) to report on the master log when the FSM reaches a specific level .

Note: By design "max_fsm_pages" must be exceed 16 times "max_fsm_relations" , else the database wont start

Example

So let us put the VACUUM/VACUUM FULL definition into practice and see how different options of vacuum works.

-- VACUUM

  • Create a test table
gpadmin=# create table vacuum_test ( a int , b text ) ;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 359.060 ms
gpadmin=#
  • The value of the tuples and pages on the table
gpadmin=#
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages | reltuples
-------------+----------+-----------
 vacuum_test |        0 |         0
(1 row)
Time: 2.438 ms
gpadmin=#
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 25.589 ms
gpadmin=#
  • Insert test data
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 6245.503 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4697.305 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4319.884 ms
gpadmin=#
gpadmin=#
  • so now the current usage of the table ( per tuples / pages )
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    15912 | 1.00086e+07
(1 row)
Time: 2.585 ms
  • Analyzing the table to know the correct data
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1789.892 ms
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    47710 | 3.00096e+07
(1 row)
Time: 2.687 ms
gpadmin=#
  • Cleaning up some rows
gpadmin=# delete from vacuum_test  where a<800000;
DELETE 2399997
Time: 967.301 ms
                                                  ^
gpadmin=# delete from vacuum_test  where a<8000000;
DELETE 21600000
Time: 2184.398 ms
  • Analyzing the table
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 26.318 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1881.765 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7763 | moderate amount of bloat suspected
(1 row)
Time: 19.336 ms
  • vacuum the table
gpadmin=# vacuum vacuum_test ;
VACUUM
Time: 132.314 ms

You will find that the vacuum has not released any space from the table , the table still uses the same OS blocks as done previously.

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 18.026 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.606 ms
gpadmin=#
  • Analyzing the table also report the same value as above and the "gp_bloat_diag" also reporting the table has bloat.
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1176.675 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359758 | vacuum_test |    47710 | 5.80361e+06
(1 row)
Time: 2.638 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7793 | moderate amount of bloat suspected
(1 row)
Time: 17.707 ms
gpadmin=#
gpadmin=#
  • Lets add data to the already vacuum table
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 5171.964 ms
  • You can witness the insert used up already freed space from the table and didn't consume any extra pages from the OS.
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 16.548 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.509 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1734.669 ms

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 16.291 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359884 | vacuum_test |    47710 | 1.62695e+07
(1 row)
Time: 1.612 ms
gpadmin=#

-- VACUUM FULL

When you do a vacuum full the data is now re-organized and the unused spaced are given back to the OS.

gpadmin=# vacuum full vacuum_test ;
NOTICE:  'VACUUM FULL' is not safe for large tables and has been known to yield unpredictable runtimes.
HINT:  Use 'VACUUM' instead.
VACUUM
Time: 1452.043 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 17.503 ms
 
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |     9552 |     6e+06
(1 row)
Time: 1.452 ms
gpadmin=#

NOTE: Running vacuum full on a very large table can lead to unexpected run time and during this run the table will be on exclusive lock the whole time , the alternative faster way to reorg the table would be to redistribute the table using the command as indicated here or CTAS or backup/restore.

Comments

Powered by Zendesk