When autovacuum kicks in (vacuum kicking in automatically) there would be sudden degradation of performance witnessed on the database. You can use the below query to check if there is any autovacuum running on the cluster.
gpssh -f hostfile "ps -ef | grep autovacuum | grep -v grep"
Autovacuum kicks in when transaction age in one or more databases goes over 200,000,000 ("autovacuum_freeze_max_age" default , this cannot be altered).
Please Note: This is "per segment."
When the autovacuum runs the table statistics are lost i.e columns like pg_class.relpages and pg_class.reltuples are marked 0, resulting in query running slower than usual and causing performance bottleneck.
We highly recommend you to upgrade your GP version to 18.104.22.168 or later version as Autovaccum daemon is disabled in these versions.
For GPDB with version below 22.214.171.124 , can use the below work-around to reduce transaction xid age.
- Find the databases/segments with high age and vacuum the tables that is nearing the transaction age of the database set via autovacuum_freeze_max_age.
- Use the following SQL to find which database's age on which instance is older than 200 million.
SELECT datname, age(datfrozenxid) FROM pg_database order by 2 desc; -- master SELECT gp_segment_id,datname, age(datfrozenxid) FROM gp_dist_random('pg_database') order by 3 desc; -- segments
- Sometimes but not always, the problem could be a result of orphaned temp schemas , these can be discovered using gpcheckcat or using
select * from pg_namespace where nspname like 'pg_temp%'; select * from gp_dist_random('pg_namespace') where nspname like 'pg_temp%';
Orphan temporary schemas can be removed with:
drop schema pg_temp_<sess_id> cascade;
If you still identify the age to be high , follow the next set of instructions.
- If database is relatively small, just vacuum everything in the database:
"vacuumdb " in the shell
This is useful for system databases ("postgres", "template1", "gpperfmon") and smaller user databases.
- If database is large, for each database/segment identify which tables is/nearing the high transaction age (> 100,000,000) and vacuum them:
select relname, age(relfrozenxid) from pg_class where relkind ='r' and relstorage != 'x' and age(relfrozenxid)> 100000000; -- master select relname, age(relfrozenxid) from gp_dist_random('pg_class') where relkind ='r' and relstorage != 'x' and age(relfrozenxid)> 100000000; -- segments
repeat the procedure until the table/relation/database are below the transaction age of 200,000,000 ("autovacuum_freeze_max_age" default )
- During autovacuum run , some of the relation might have lost statistics, analyze those tables. A quick way to find is if the pg_class.reltuples is marked 0 for your major tables.
Please note : The "template0" is marked as "no connections" so does not need to be handled by above procedure.
In order to avoid AutoVacuum kicking in after reaching the transaction , we would recommend to do vacuum ( Normal, not FULL ) on the table showing high transaction during a maintenance window.