Pivotal Knowledge Base

Follow

Pivotal HAWQ: segment failures and recovery scenarios

Environment
This article applies only to Pivotal HAWQ 1.x or Pivotal HDB 2.x see Troubleshooting Pivotal HDB offline segments.

Summary
How to handle segment failures on HAWQ?

Scenario1: There are 2 servers with 2 segments each.  Except 1 segment, all other segments are down.  Will HAWQ still be operational?
Scenario 2: login is successful to HAWQ, however an error "No alive segment" is reported and gprecoverseg is not working? How to bring the cluster up?
Scenario 3: a local segment directory has been lost, e.g. /data1/primary/gpseg0. How to recover the segment?


Solution
Scenario 1: there are 2 servers with 2 segments each. Except 1 segment, all other segments are down. Will HAWQ still be operational?

 hawq=# SELECT dbid,content,role,preferred_role,mode,status,port,hostname FROM pg_catalog.gp_segment_configuration;
 dbid | content | role | preferred_role | mode | status | port  | hostname |
------+---------+------+----------------+------+--------+-------+----------+
    3 |       1 | p    | p              | s    | u      | 40001 | hdw1     |
    1 |      -1 | p    | p              | s    | u      |  5432 | hdm1     |
    6 |      -1 | m    | m              | s    | u      |  5432 | hdw2     |
    4 |       2 | p    | p              | s    | d      | 40000 | hdw2     |
    5 |       3 | p    | p              | s    | d      | 40001 | hdw2     |
    2 |       0 | p    | p              | s    | d      | 40000 | hdw1     |
(6 rows)

Note: segment instances were killed using kill command on the datanodes hdw2 & hdw1.
Yes, HAWQ will be able to still sustain and be operational. Refer to the test below:

hawq=# CREATE TABLE smart_boss2 (a int) DISTRIBUTED RANDOMLY;
CREATE TABLE
hawq=# INSERT INTO smart_boss2 SELECT 2 FROM generate_series(1,1000);
INSERT 0 1000
hawq=# SELECT COUNT(*) FROM smart_boss2;
 count
-------
  1000
(1 row)

gprecoverseg must be used.

Note: make sure DataNodes are running on the servers before gprecoverseg is executed.  Also this will cause any running queries to be canceled.

[gpadmin@hdm1 pg_log]$ gprecoverseg -a
20131029:16:54:25:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting gprecoverseg with args: -a
20131029:16:54:26:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20131029:16:54:27:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.4.0 build 5053) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Dec  9 2013 13:42:34'
20131029:16:54:28:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Greenplum instance recovery parameters
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery type              = Standard
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery 1 of 3
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance host                  = hdw1
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance address               = hdw1
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance directory             = /data1/primary/gpseg0
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance port                  = 40000
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance replication port      = None
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance pg_system directory   = /data1/primary/gpseg0
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Recovery Target                       = in-place
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery 2 of 3
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance host                  = hdw2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance address               = hdw2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance directory             = /data1/primary/gpseg2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance port                  = 40000
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance replication port      = None
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance pg_system directory   = /data1/primary/gpseg2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Recovery Target                       = in-place
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery 3 of 3
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance host                  = hdw2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance address               = hdw2
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance directory             = /data1/primary/gpseg3
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance port                  = 40001
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance replication port      = None
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance pg_system directory   = /data1/primary/gpseg3
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-   Recovery Target                       = in-place
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-3 segment(s) to recover
20131029:16:54:32:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Ensuring 3 failed segment(s) are stopped
....
updating flat files
20131029:16:54:36:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-HAWQ rescovery
20131029:16:54:36:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting failover segments
20131029:16:54:37:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
......
20131029:16:54:43:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Process results...
20131029:16:54:43:164297 gprecoverseg:hdm1:gpadmin-[INFO]:-Updating configuration to mark segments up

Scenario 2: login is successful to HAWQ, however an error "No alive segment" is reported and gprecoverseg is not working? How to bring the cluster up?

If all the segments are down, HAWQ needs to be restarted in order to bring up the segments up.

hawq=# SELECT * FROM pg_catalog.gp_segment_configuration;
ERROR:  No alive segment in the GPSQL. (cdbfts.c:1781)

hawq=# \dt
ERROR:  No alive segment in the cluser. (cdblink.c:108)
ERROR:  No alive segment in the cluser. (cdblink.c:108)
ERROR:  No alive segment in the cluser. (cdblink.c:108)

gprecoverseg will not work since it will not be able to start any transactions.

[gpadmin@hdm1 pg_log]$ gprecoverseg -a
20131029:16:58:59:164507 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting gprecoverseg with args: -a
20131029:16:59:00:164507 gprecoverseg:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20131029:16:59:00:164507 gprecoverseg:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.4.0 build 5053) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Dec  9 2013 13:42:34'
20131029:16:59:01:164507 gprecoverseg:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20131029:16:59:01:164507 gprecoverseg:hdm1:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='error 'can't start transaction' in 'BEGIN'') exiting...

To bring the segments up, HAWQ needs to be restarted.

[gpadmin@hdm1 ~]$ gpstop -ar
20131029:18:32:29:166958 gpstop:hdm1:gpadmin-[INFO]:-Starting gpstop with args: -ar
20131029:18:32:29:166958 gpstop:hdm1:gpadmin-[INFO]:-Gathering information and validating the environment...
20131029:18:32:30:166958 gpstop:hdm1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20131029:18:32:30:166958 gpstop:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-Greenplum Version: 'postgres (HAWQ) 4.2.0 build 1'
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-There are 0 connections to the database
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-Master host=hdm1
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart
20131029:18:32:34:166958 gpstop:hdm1:gpadmin-[INFO]:-Master segment instance directory=/data1/master/gpseg-1
20131029:18:32:35:166958 gpstop:hdm1:gpadmin-[INFO]:-Stopping gpsyncmaster on standby host phd103-standby mode=fast
20131029:18:32:38:166958 gpstop:hdm1:gpadmin-[INFO]:-Successfully shutdown sync process on phd103-standby
20131029:18:32:38:166958 gpstop:hdm1:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
....
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-Failed Segment Stop Information
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-DBID:4  FAILED  host:'hdw2' datadir:'/data1/primary/gpseg2' with reason:'Shutdown failed'
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-DBID:5  FAILED  host:'hdw2' datadir:'/data1/primary/gpseg3' with reason:'Shutdown failed'
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-DBID:2  FAILED  host:'hdw1' datadir:'/data1/primary/gpseg0' with reason:'Shutdown failed'
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-DBID:3  FAILED  host:'hdw1' datadir:'/data1/primary/gpseg1' with reason:'Shutdown failed'
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-   Segments stopped successfully      = 0
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-Segments with errors during stop   = 4   <<<<<<<<
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-Successfully shutdown 0 of 4 segment instances <<<<<<<<
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-Segment instance shutdown failures reported
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-Failed to shutdown 4 of 4 segment instances <<<<<
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-A total of 4 errors were encountered
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-Review logfile /home/gpadmin/gpAdminLogs/gpstop_20131029.log
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-For more details on segment shutdown failure(s)
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[WARNING]:-------------------------------------------------
20131029:18:32:42:166958 gpstop:hdm1:gpadmin-[INFO]:-Restarting System...

After the cluster is started again, everything should work fine if at least one segment is up:

hawq=# \dt
                  List of relations
 Schema |    Name     | Type  |  Owner  |   Storage
--------+-------------+-------+---------+-------------
 public | apple       | table | gpadmin | append only
 public | bhuv        | table | gpadmin | append only
 public | bigboy      | table | gpadmin | append only

Scenario 3: a local segment directory has been lost, e.g. /data1/primary/gpseg0. How to recover the segment?

[gpadmin@hdm1 ~]$ gpstate
20140509:10:04:08:214654 gpstate:hdm1:gpadmin-[INFO]:-Starting gpstate with args:
20140509:10:04:09:214654 gpstate:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.4.0 build 5053) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Dec  9 2013 13:42:34'
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-Gathering data from segments...
.......
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-HAWQ instance status summary
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Master instance                                = Active
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Master standby                                 = No master standby configured
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total segment instance count from metadata     = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Primary Segment Status
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total primary segments                         = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total primary segment valid (at master)        = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total primary segment failures (at master)     = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number of postmaster.pid files missing   = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of postmaster.pid files found     = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing    = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found      = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of /tmp lock files missing        = 0
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of /tmp lock files found          = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number postmaster processes missing      = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number postmaster processes found        = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@hdm1 ~]$

1. Make sure the "primary" directory, e.g. /data1/primary, exists and has the right ownership (normally gpadmin:hadoop, check other segments for current ownership):

[gpadmin@hdm1 ~]$ gpstate
20140509:10:04:08:214654 gpstate:hdm1:gpadmin-[INFO]:-Starting gpstate with args:
20140509:10:04:09:214654 gpstate:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.4.0 build 5053) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Dec  9 2013 13:42:34'
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20140509:10:04:10:214654 gpstate:hdm1:gpadmin-[INFO]:-Gathering data from segments...
.......
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-HAWQ instance status summary
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Master instance                                = Active
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Master standby                                 = No master standby configured
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total segment instance count from metadata     = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Primary Segment Status
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total primary segments                         = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total primary segment valid (at master)        = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total primary segment failures (at master)     = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number of postmaster.pid files missing   = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of postmaster.pid files found     = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing    = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found      = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of /tmp lock files missing        = 0
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number of /tmp lock files found          = 6
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[WARNING]:-Total number postmaster processes missing      = 1                              <<<<<<<<
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-   Total number postmaster processes found        = 5
20140509:10:04:17:214654 gpstate:hdm1:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@hdm1 ~]$

2. Run gprecoverseg -F

[gpadmin@hdm1 ~]$ gprecoverseg -F
20140509:10:05:58:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting gprecoverseg with args: -F
20140509:10:05:58:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1'
20140509:10:05:59:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.4.0 build 5053) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Dec  9 2013 13:42:34'
20140509:10:06:00:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master...
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Greenplum instance recovery parameters
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery type              = Standard
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Recovery 1 of 1
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance host                  = hdw1
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance address               = hdw1
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance directory             = /data1/primary/gpseg0
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance port                  = 40000
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance replication port      = None
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Failed instance pg_system directory   = /data1/primary/gpseg0
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-   Recovery Target                       = in-place
20140509:10:06:03:214747 gprecoverseg:hdm1:gpadmin-[INFO]:----------------------------------------------------------

Continue with segment recovery procedure Yy|Nn (default=N):
> y
20140509:10:06:04:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-1 segment(s) to recover
20140509:10:06:04:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
....
20140509:10:06:09:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Cleaning files from 1 segment(s)
............
20140509:10:06:21:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Building template directory
20140509:10:06:22:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Creating template
20140509:10:06:24:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting copy of segment dbid 3 to location /tmp/GPSQL/gpsql_template20140509_100621
20140509:10:06:29:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-default temporary direcotry was used
20140509:10:06:29:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Validating remote directories
..
20140509:10:06:31:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Copying template directory file
..
20140509:10:06:33:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Configuring new segments
....
20140509:10:06:37:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Cleaning files
.
20140509:10:06:39:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting file move procedure for hdw1:/data1/primary/gpseg0:content=0:dbid=2:mode=s:status=d
updating flat files
20140509:10:06:39:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-HAWQ rescovery
20140509:10:06:39:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Starting failover segments
20140509:10:06:40:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
.....
20140509:10:06:45:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Process results...
20140509:10:06:45:214747 gprecoverseg:hdm1:gpadmin-[INFO]:-Updating configuration to mark segments up
[gpadmin@hdm1 ~]$

Comments

Powered by Zendesk