Pivotal Knowledge Base

Follow

Orphaned Temporary Schemas in Greenplum Database

Environment

Product Version
Pivotal Greenplum (GPDB) All versions

Purpose

Each database session may create and use a temporary schema for handling session temporary tables. Usually, these schemas are dropped at the end of the session. However, because of the process crash or other non-default session finishes, these temporary schemas continue to exist after the session has finished.

These temporary schemas do not usually create trouble except when transaction management is involved (tracking transaction age for transaction wraparound prevention) and catalog check is done. In the former case, these temporary schemas (and objects within them) are session specific so VACUUM commands can not process them. In the latter case, they provide a lot of noise in the gpcheckcat log file and can mask other important catalog inconsistencies.

Because of the above reasons, it is recommended that these 'orphan' temporary schemas are dropped, especially before doing catalog check using gpcheckcat. gpcheckcat utility will of course recognize them and point them, but it is easier and faster to get rid of these orphan temp schemas before running the tool. It is recommended for the database to be in the admin mode (no user sessions), so these orphan schemas can be identified easily.

Resolution

Commands:

The following commands will create scripts to clean up orphan temporary schemas on master and segment instances. The commands are taken into consideration while running the sessions, so this will work with the existing workload.

Step 1: Identify orphan temporary schemas, generate scripts to drop them: 

cat /dev/null > execute_drop_on_all.sh | psql -d template1 -Atc "select datname from pg_database where datname != 'template0'" | while read a; do echo "Checking database ${a}"; psql -Atc "select 'drop schema if exists ' || nspname || ' cascade;' from (select nspname from pg_namespace where nspname like 'pg_temp%' union select nspname from gp_dist_random('pg_namespace') where nspname like 'pg_temp%' except select 'pg_temp_' || sess_id::varchar from pg_stat_activity) as foo" ${a} > drop_temp_schema_$a.ddl ; echo "psql -f drop_temp_schema_$a.ddl -d ${a}" >> execute_drop_on_all.sh ; done

The above script will create a .sql file for each database in the current directory, which contains commands to drop orphan temp. schemas.

Check the contents for all the generated scripts before executing them.

Step 2: Execute the scripts generated above to drop the orphan schemas.

chmod +x execute_drop_on_all.sh
nohup ./execute_drop_on_all.sh | tee execute_drop_on_all.op 2>&1

The above script contains the commands to execute the ddl files generated during step 1.

After executing Step 2, there should be no orphan schema's in the database, you can verify that by executing Step 1 again and inspecting the .sql files.

Comments

  • Avatar
    Carrie Berlin

    Commenting as I have had experience in the past writing technical notes for the Oracle database. The glaring information left out out of this note is a statement regarding whether this clean up can be run when the database is live or needs to be run when the database is in restricted mode. Granted this could be a judgement call, depending on how busy the database is, but I think to best assist customers you need to comment on the "state" of the database during the correction process. People who run production databases do not want to depend on an "implicit" interpretation regarding the condition of the database when the script is run. It is important and very thorough and explicit in these notes.

  • Avatar
    Brendan Stephens

    In reply to the question above --

    The commands take into consideration running sessions, so this will work with concurrent workload.

    In the query, it calls out all the temp schemas that are not attached to any current session:

    except select 'pg_temp_' || sess_id::varchar from pg_stat_activity

    Edited by Brendan Stephens
Powered by Zendesk