Pivotal Knowledge Base

Follow

postgres database is filling up root file system on Pivotal Command Center host

Environment

  • PHD 2.1.0.0

Symptom

/var/lib/pgsql disk space usage is 10GB and growing on the Pivotal Command Center node

/var/lib/psql                    - 11 G

Cause

PCC will continuously collect hadoop and system metrics from the installed clusters. Depending on the size of the cluster regular maintenance may be necessary keep the database size under control

Fix

  1. Connect to the database and identify what tables are using the most space
    [gpadmin@pccadmin ~]$ psql -U postgres gphdmgr -p 10432
    gphdmgr=# SELECT tablename ,pg_size_pretty(pg_total_relation_size(tablename::text)) AS size FROM pg_tables WHERE schemaname = 'public' order by 1 DESC;
                tablename             |    size
    ----------------------------------+------------
     data_node_report                 | 10 GB
     system_history_agg_all_hosts_4h  | 240 kB
     system_history_agg_all_hosts_30m | 1560 kB
     system_history_agg_all_hosts_1m  | 2656 kB
     system_history_agg_all_hosts_1h  | 816 kB
     system_history_agg_all_hosts_1d  | 88 kB
     system_history                   | 14 MB
     stack_properties                 | 32 kB
     stack                            | 32 kB
  2. In this case data_node_report table is 10GB and is used for collecting HDFS usage statistics. This table is safe to truncate and will not impact icm or PCC operations. This will simply purge the historical statistics from the database. The following list of tables are safe execute steps 3-6. Any table not in this list should never be modified, deleted, or changed without consulting pivotal support.
    app Yarn Application details
    app_attempt Yarn Container details for yarn application
    data_node_report HDFS data usage statistics
    hbase_metrics Hbase Operation metrics
    job Mapreduce History stats for completed applications
    job_attempt Mapreduce Containers stats
    job_tracker_metrics Resource Manager statistics
    namenode_metrics Namenode RPC and load statistics
    queue_metrics Yarn Scheduler statistics
    system_history CPU, Mem, IO stats for each host

    NOTE: Please take caution when proceeding through steps 3 and 6.  Truncating the wrong table could result in disabling icm_client operations.  In addition truncating the tables in the above list will result in lost historical cluster statistics

  3. First backup the database to a safe location incase any mistakes are made during this maintenance activity
    [root@pccadmin ~]# service commander stop
    
    Stopping PHD Manager Services...
    nodeagent is not running
    Stopping Jetty                                             [  OK  ]
    Stopping httpd:                                            [  OK  ]
    Stopping Pivotal Command Center HTTPS                      [  OK  ]
    Stopping Pivotal Command Center Background Worker          [  OK  ]
    
    [root@pccadmin ~]# service postgresql stop
    Stopping postgresql service:                               [  OK  ]
    
    [root@pccadmin ~]# tar -cf /backup/pgsql.backup.12-01-2014.tar /var/lib/pgsql
    
  4. After backup is completed start the database and truncate the large tables
    [root@pccadmin ~]# service postgresql start
    Starting postgresql service:                               [  OK  ]
    
    [root@pccadmin ~]# psql -U postgres -p 10432 gphdmgr
    
    gphdmgr=# TRUNCATE TABLE data_node_report;
    TRUNCATE TABLE
    
  5. Vacuum the database. Please DO NOT CANCEL this operation once triggered.
    gphdmgr=# VACUUM FULL;
    VACUUM
    
  6. Remember to REINDEX the database to clean up related index tables and any toast tables
    gphdmgr=# REINDEX DATABASE gphdmgr;
    NOTICE:  table "pg_class" was reindexed
    NOTICE:  table "pg_type" was reindexed
    .
    .
    REINDEX
    gphdmgr=#
    
  7. Confirm database usage is decreased
    gphdmgr=# SELECT 'data_node_report' ,pg_size_pretty(pg_total_relation_size('data_node_report')) AS size;
         ?column?     | size
    ------------------+-------
     data_node_report | 16 kB
    
  8. Start command center services
    [root@pccadmin ~]# service commander start
    
    Starting PHD Manager Services...
    Starting httpd:                                            [  OK  ]
    Preparing nodeagent:                                       [  OK  ]
    Starting  nodeagent:                                       [  OK  ]
    
    Starting Jetty                                             [  OK  ]
    Waiting for Jetty to mount applications                    [  OK  ]
    Starting Pivotal Command Center HTTPS                      [  OK  ]
    Starting Pivotal Command Center Background Worker          [  OK  ]
    
  9. During this procedure we disabled postgresql and that might have caused the nmon process to shutdown on the pivotal command center node. To ensure nmon is still running run these steps from the pivotal command center node
    [root@pccadmin ~]# service nmon start
    [root@pccadmin ~]# massh  <hostfile> verbose "service nmon restart"
    [root@pccadmin ~]# massh  <hostfile> verbose "service nmon status"
    

Comments

Powered by Zendesk