Pivotal Knowledge Base

Follow

Backup (gpcrondump) Starts and at Once Hangs

Environment 

 Product  Version
 Pivotal Greenplum (GPDB)  4.3.x
 OS  RHEL 6.x
 Others  

Overview

Gpcrondump hangs as soon as it is executed.

Symptom

gpcrondump looks like it has not even started yet. In other words, gp_dump_agent does not start on segments. No status files are created (both master and segments) and gpcrondump log just says "Starting Dump process"

Note: To know the paths of the different gpcrondump logs, status and rpt files click here 

Checklist

  • Ensure the gp_dump_agents are not running. Refer to this article on how to do that.
  • Make sure there are no status files generated. Refer to this article on how to know that. 
  • Lookup hostname for mdw on gp_segment_configuration and then do a "psql -h <hostname>". It should produce one of these 2 errors:
psql
select hostname from gp_segment_confiuration where dbid = -1
\q
psql -h <hostname-from-above-query> -- Will result in below error
[gpadmin@gpdb-sandbox gpseg-1]$ psql -h gpdb-sandbox.localdomain
Password: 
psql: fe_sendauth: no password supplied

psql: FATAL: no pg_hba.conf entry for host "172.28.20.135", user "gpadmin", database "gpadmin", SSL off
  • Trace the process to make sure it is waiting for something and not progressing, see example below:
[gpadmin@den-02-01 hosts]$ strace -p 35917
Process 35917 attached
select(10, [7 9], [], [], {0, 600503}) = 0 (Timeout)
wait4(37512, 0x7fff91ded084, WNOHANG, NULL) = 0

Cause

This is because, on the master host's $MASTER_DATA_DIRECTORY, the entry of mdw could be missing or incorrect. As we can see below, the entry for the master host gpdb-sandbox.localdomain on the pg_hba.conf has a typo (.156 instead of .157)

gpadmin=# select hostname from gp_segment_configuration where dbid=1;
hostname
--------------------------
gpdb-sandbox.localdomain
(1 row) gpadmin=# ^D\q [gpadmin@gpdb-sandbox ~]$ ping gpdb-sandbox.localdomain
PING gpdb-sandbox.localdomain (172.16.34.157) 56(84) bytes of data.
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=2 ttl=64 time=0.026 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=3 ttl=64 time=0.028 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=4 ttl=64 time=0.077 ms
^C
--- gpdb-sandbox.localdomain ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3534ms
rtt min/avg/max/mdev = 0.016/0.036/0.077/0.024 ms [gpadmin@gpdb-sandbox ~]$ grep 172.16.34.15 $MASTER_DATA_DIRECTORY/pg_hba.conf
host all gpadmin 172.16.34.156/32 trust
[gpadmin@gpdb-sandbox ~]$

RCA

This would most likely be a user error like a typo or entry deleted by mistake.

Resolution

  • Before fixing the issue, we should be able to reproduce the error through gp_dump. gpcrondump internally causes gp_dump. The gpcrondump logfile will have the complete gp_dump command that it is executing. See below:
[gpadmin@gpdb-sandbox ~]$ gpcrondump -x gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting gpcrondump with args: -x gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master Greenplum Instance dump parameters
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump type = Full database
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Database to be dumped = gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master port = 5432
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master data directory = /gpdata/master/gpseg-1
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Run post dump program = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Rollback dumps = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump file compression = On
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Clear old dump files = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Update history table = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Secure config files = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump global objects = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Vacuum mode type = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Ensuring remaining free disk > 10 Continue with Greenplum dump Yy|Nn (default=N):
> y
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Directory /gpdata/master/gpseg-1/db_dumps/20160812 not found, will try to create
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Created /gpdata/master/gpseg-1/db_dumps/20160812
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked /gpdata/master/gpseg-1 on master
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Configuring for single database dump
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Validating disk space
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding compression parameter
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding --no-expand-children
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812012728 --no-lock --gp-c --no-expand-children "gpadmin"
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting Dump process

  • Kill both gpcrondump and gp_dump currently running (use kill -3).
  • Run just the gp_dump full command interactively and as seen below, it'll throw the error out.
[gpadmin@gpdb-sandbox ~]$ gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812012728 --no-lock --gp-c --no-expand-children gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Read params: <empty>
20160812:01:31:55|gp_dump-[INFO]:-Command line options analyzed.
20160812:01:31:55|gp_dump-[INFO]:-Connecting to master database on host localhost port 5432 database gpadmin.
20160812:01:31:55|gp_dump-[INFO]:-Reading Greenplum Database configuration info from master database.
20160812:01:31:55|gp_dump-[INFO]:-Preparing to dump the following segments:
20160812:01:31:55|gp_dump-[INFO]:-Segment 1 (dbid 3)
20160812:01:31:55|gp_dump-[INFO]:-Segment 0 (dbid 2)
20160812:01:31:55|gp_dump-[INFO]:-Master (dbid 1)
20160812:01:31:55|gp_dump-[INFO]:-About to spin off 3 threads with timestamp key 20160812012728
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 3: host gpdb-sandbox.localdomain port 40001 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 2: host gpdb-sandbox.localdomain port 40000 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 1: host gpdb-sandbox.localdomain port 5432 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Waiting for remote gp_dump_agent processes to start transactions in serializable isolation level
20160812:01:31:55|gp_dump-[ERROR]:-Connection to dbid 1 on host gpdb-sandbox.localdomain failed: fe_sendauth: no password supplied
20160812:01:31:55|gp_dump-[INFO]:-Listening for messages from server on dbid 2 connection
20160812:01:31:55|gp_dump-[INFO]:-Listening for messages from server on dbid 3 connection
20160812:01:31:55|gp_dump-[INFO]:-Successfully launched Greenplum Database backup on dbid 2 server
20160812:01:31:55|gp_dump-[INFO]:-Successfully launched Greenplum Database backup on dbid 3 server
20160812:01:31:57|gp_dump-[INFO]:-noticed that a cancel order is in effect. Informing dbid 2 on host gpdb-sandbox.localdomain by notifying on connection
20160812:01:31:57|gp_dump-[INFO]:-noticed that a cancel order is in effect. Informing dbid 3 on host gpdb-sandbox.localdomain by notifying on connection
20160812:01:31:57|gp_dump-[INFO]:-All remote gp_dump_agent processes have began transactions in serializable isolation level
20160812:01:31:57|gp_dump-[INFO]:-Waiting for remote gp_dump_agent processes to obtain local locks on dumpable objects
  • The problem is that the error is not captured by the calling program gpcrondump.
  • Again, kill the gp_dump process (use kill -3).
  • Fix the pg_hba.conf issue, see this document for help, in our example, we need to change .156 to .157.
gpadmin@gpdb-sandbox 20160812]$ grep 157 $MASTER_DATA_DIRECTORY/pg_hba.conf
host all gpadmin 172.16.34.157/32 trust

  • Reload the changes by gpstop -u.
[gpadmin@gpdb-sandbox gpseg-1]$ gpstop -u
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Starting gpstop with args: -u
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Gathering information and validating the environment...
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Segment details from master...
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Signalling all postmaster processes to reload
.
  • Try doing the psql -h gpdb-sandbox.localdomain and it should work.
    [gpadmin@gpdb-sandbox ~]$ psql -h gpdb-sandbox.localdomain
    psql (8.2.15)
    Type "help" for help.
    
  • The gpcrondump can be started again and should now work.
20160812:01:44:32:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812014429 --no-lock --gp-c --no-expand-children "gpadmin"
20160812:01:44:32:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting Dump process
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process returned exit code 0
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Timestamp key = 20160812014429
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked master status file and master dump file.

Additional Information

Troubleshooting long running backups

Comments

Powered by Zendesk