Pivotal Knowledge Base

Follow

How to estimate Greenplum 4.2 to 4.3 upgrade time

Environment

Product Version
Pivotal Greenplum (GPDB) 4.2 and 4.3

Purpose

GPDB 4.2.x to 4.3.x upgrade involves converting all Append Only (AO) tables to Append Optimized tables. This includes creating some subobjects for each AO table. This process takes time depending on the number of AO objects.

Resolution

The documented time estimation for this process is described here: http://gpdb.docs.pivotal.io/tools/pdf/estimate_42_to_43_migrate_time.pdf

The actual function that does the estimating is linked here: http://gpdb.docs.pivotal.io/tools/estimate_42_to_43_migrate_time.zip 

This function needs to be installed and run individually on every database and the total sum is the total upgrade estimate. See here for details about the function: https://support.pivotal.io/hc/en-us/articles/205511208-Estimating-AO-Object-Migration-Time-from-GPDB-4-2-x-to-GPDB-4-3-x

Results:

  • The estimate provided is pretty accurate if the following details are met:
    • The number of primary segments is 600 or lower.
    • The catalog is VACUUM and ANALYZED before the upgrade (the catalog bloat does not affect the estimate on the function but affects the actual upgrade timeline)
  • The estimate provided is not very accurate for clusters with more than 600 segments. In these situations, the timeline increases exponentially. Therefore in those cases please use the alternative formula for estimation 

Alternative formula for large clusters:

Estimated time in seconds = N * t * O + F

where:

  • N -> number of AO tables on the cluster
  • T -> average time for create table operation (simple heap table) in seconds
  • O -> number of operations (constant = 7)
  • F -> fixed cost of 1 hour (3600 seconds)

Example:

Number of AO tables: 15289
Time per create-table: 1.64 sec
Num of operations per AO Table: 7
Fixed cost: 1 hour

15289 * 1.64 * 7 +  3600= 179117.72 sec = 49.75 hours (about 50 hours)

Run the following Pre-upgrade checks:

  • Upgrade latest DCA ISO before upgrading the database. If it is not possible to upgrade DCA ISO, at least, apply the hotfix.
  • Run a gpcheckcat and ensure there are no catalog issues prior to an upgrade.
  • Take a full backup of the cluster.
  • Run dcacheck prior to the upgrade to ensure there are no hardware issues.
  • It is recommended to run vacuum analyze on the catalog prior to the upgrade. 
  • Also, update pg_hba.conf file to not allow any other connections other than gpadmin.

Hotfix for DCA V2: https://support.emc.com/downloads/27044_Data-Computing-Appliance-V2

Hotfix for DCA V1: https://support.emc.com/downloads/9507_Greenplum-Data-Computing-Appliance-V1

Post-upgrade:

During upgrade after the checks gets completed the database will be stopped and a check (stat) and copy process will be done on for $MASTER_DATA_DIRECTORY. During "stat" no log messages will be there and it seems like the process is in a hung situation. You can check the "strace" on the "python" process running gpmigrator and also check the "$MASTER_DATA_DIRECTORY /gpmigratorstatecheck" file to get the current state.
The strace will show the stat on files. If there are large numbers of files in $MASTER_DATA_DIRECTORY then stat will take time and add on to the upgrade time.

Also, if the customer will use ORCA its recommended to run analyze on the root partitions post upgrade.

 

Additional Information

Related Articles:

'Header checksum does not match' an error for Append-Only tables

gpmigrator failed with error psql:appendonly_upgrade.sql:277: ERROR:  syntax error at or near "."

Comments

  • Avatar
    Rohit Kapil

    Thanks Archana for putting this together.

  • Avatar
    Kushal Choubay

    One thing to add - During upgrade after the checks gets completed the database will be stopped and a check (stat) and copy process will be done on for $MASTER_DATA_DIRECTORY . During "stat" no log messages will be there and it seems like the process is in a hung situation. You can check the "strace" on the "python" process running gpmigrator and also check the "$MASTER_DATA_DIRECTORY /gp_migrator_state_check" file to get the current state.
    The strace will show the stat on files. If there are large number of files in $MASTER_DATA_DIRECTORY then stat will take time and add on to the upgrade time.

  • Avatar
    Archana Sundararam

    Thanks Kushal for noting this, I have added it to the article as well.

Powered by Zendesk