Pivotal Knowledge Base

Follow

Backup (gpcrondump) Starts and at once Hangs

Environment 

  • Pivotal Greenplum Database (GPDB) 4.3.x
  • Operating System- Red Hat Enterprise Linux 6.x

Overview

Gpcrondump hangs as soon as it is executed.

Symptom

gpcrondump looks like it has not even started yet. In other words, gp_dump_agent does not start on segments. No status files are created (both master and segments) and gpcrondump log just says "Starting Dump process"

Note: To know the paths of the different gpcrondump logs, status and rpt files click here 

Checklist

  • Ensure the gp_dump_agents are not running. Refer to this article on how to do that.
  • Make sure there are no status files generated. Refer to this article on how to know that. 
  • Lookup hostname for mdw on gp_segment_configuration and then do a "psql -h <hostname>". It should produce one of these 2 errors:
psql
select hostname from gp_segment_confiuration where dbid = -1
\q
psql -h <hostname-from-above-query> -- Will result in below error
[gpadmin@gpdb-sandbox gpseg-1]$ psql -h gpdb-sandbox.localdomain
Password: 
psql: fe_sendauth: no password supplied

psql: FATAL: no pg_hba.conf entry for host "172.28.20.135", user "gpadmin", database "gpadmin", SSL off
  • Trace the process to make sure it is waiting for something and not progressing, see example below:
[gpadmin@den-02-01 hosts]$ strace -p 35917
Process 35917 attached
select(10, [7 9], [], [], {0, 600503}) = 0 (Timeout)
wait4(37512, 0x7fff91ded084, WNOHANG, NULL) = 0

Cause

This is because, on the master host's $MASTER_DATA_DIRECTORY, the entry of mdw could be missing or incorrect. As we can see below, the entry for the master host gpdb-sandbox.localdomain on the pg_hba.conf has a typo (.156 instead of .157)

gpadmin=# select hostname from gp_segment_configuration where dbid=1;
hostname
--------------------------
gpdb-sandbox.localdomain
(1 row) gpadmin=# ^D\q [gpadmin@gpdb-sandbox ~]$ ping gpdb-sandbox.localdomain
PING gpdb-sandbox.localdomain (172.16.34.157) 56(84) bytes of data.
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=2 ttl=64 time=0.026 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=3 ttl=64 time=0.028 ms
64 bytes from gpdb-sandbox.localdomain (172.16.34.157): icmp_seq=4 ttl=64 time=0.077 ms
^C
--- gpdb-sandbox.localdomain ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3534ms
rtt min/avg/max/mdev = 0.016/0.036/0.077/0.024 ms [gpadmin@gpdb-sandbox ~]$ grep 172.16.34.15 $MASTER_DATA_DIRECTORY/pg_hba.conf
host all gpadmin 172.16.34.156/32 trust
[gpadmin@gpdb-sandbox ~]$

RCA

This would most likely be a user error like a typo or entry deleted by mistake.

Resolution

  • Before fixing the issue, we should be able to reproduce the error through gp_dump. gpcrondump internally causes gp_dump. The gpcrondump logfile will have the complete gp_dump command that it is executing. See below:
[gpadmin@gpdb-sandbox ~]$ gpcrondump -x gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting gpcrondump with args: -x gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master Greenplum Instance dump parameters
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump type = Full database
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Database to be dumped = gpadmin
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master port = 5432
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Master data directory = /gpdata/master/gpseg-1
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Run post dump program = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Rollback dumps = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump file compression = On
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Clear old dump files = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Update history table = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Secure config files = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump global objects = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Vacuum mode type = Off
20160812:01:27:28:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Ensuring remaining free disk > 10 Continue with Greenplum dump Yy|Nn (default=N):
> y
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Directory /gpdata/master/gpseg-1/db_dumps/20160812 not found, will try to create
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Created /gpdata/master/gpseg-1/db_dumps/20160812
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked /gpdata/master/gpseg-1 on master
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Configuring for single database dump
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Validating disk space
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding compression parameter
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding --no-expand-children
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812012728 --no-lock --gp-c --no-expand-children "gpadmin"
20160812:01:27:31:322496 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting Dump process

  • Kill both gpcrondump and gp_dump currently running (use kill -3).
  • Run just the gp_dump full command interactively and as seen below, it'll throw the error out.
[gpadmin@gpdb-sandbox ~]$ gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812012728 --no-lock --gp-c --no-expand-children gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Read params: <empty>
20160812:01:31:55|gp_dump-[INFO]:-Command line options analyzed.
20160812:01:31:55|gp_dump-[INFO]:-Connecting to master database on host localhost port 5432 database gpadmin.
20160812:01:31:55|gp_dump-[INFO]:-Reading Greenplum Database configuration info from master database.
20160812:01:31:55|gp_dump-[INFO]:-Preparing to dump the following segments:
20160812:01:31:55|gp_dump-[INFO]:-Segment 1 (dbid 3)
20160812:01:31:55|gp_dump-[INFO]:-Segment 0 (dbid 2)
20160812:01:31:55|gp_dump-[INFO]:-Master (dbid 1)
20160812:01:31:55|gp_dump-[INFO]:-About to spin off 3 threads with timestamp key 20160812012728
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 3: host gpdb-sandbox.localdomain port 40001 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 2: host gpdb-sandbox.localdomain port 40000 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Creating thread to backup dbid 1: host gpdb-sandbox.localdomain port 5432 database gpadmin
20160812:01:31:55|gp_dump-[INFO]:-Waiting for remote gp_dump_agent processes to start transactions in serializable isolation level
20160812:01:31:55|gp_dump-[ERROR]:-Connection to dbid 1 on host gpdb-sandbox.localdomain failed: fe_sendauth: no password supplied
20160812:01:31:55|gp_dump-[INFO]:-Listening for messages from server on dbid 2 connection
20160812:01:31:55|gp_dump-[INFO]:-Listening for messages from server on dbid 3 connection
20160812:01:31:55|gp_dump-[INFO]:-Successfully launched Greenplum Database backup on dbid 2 server
20160812:01:31:55|gp_dump-[INFO]:-Successfully launched Greenplum Database backup on dbid 3 server
20160812:01:31:57|gp_dump-[INFO]:-noticed that a cancel order is in effect. Informing dbid 2 on host gpdb-sandbox.localdomain by notifying on connection
20160812:01:31:57|gp_dump-[INFO]:-noticed that a cancel order is in effect. Informing dbid 3 on host gpdb-sandbox.localdomain by notifying on connection
20160812:01:31:57|gp_dump-[INFO]:-All remote gp_dump_agent processes have began transactions in serializable isolation level
20160812:01:31:57|gp_dump-[INFO]:-Waiting for remote gp_dump_agent processes to obtain local locks on dumpable objects
  • The problem is that the error is not captured by the calling program gpcrondump.
  • Again, kill the gp_dump process (use kill -3).
  • Fix the pg_hba.conf issue, see this document for help, in our example, we need to change .156 to .157.
gpadmin@gpdb-sandbox 20160812]$ grep 157 $MASTER_DATA_DIRECTORY/pg_hba.conf
host all gpadmin 172.16.34.157/32 trust

  • Reload the changes by gpstop -u.
[gpadmin@gpdb-sandbox gpseg-1]$ gpstop -u
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Starting gpstop with args: -u
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Gathering information and validating the environment...
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Segment details from master...
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
20160812:01:44:20:324728 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Signalling all postmaster processes to reload
.
  • Try doing the psql -h gpdb-sandbox.localdomain and it should work.
    [gpadmin@gpdb-sandbox ~]$ psql -h gpdb-sandbox.localdomain
    psql (8.2.15)
    Type "help" for help.
    
  • The gpcrondump can be started again and should now work.
20160812:01:44:32:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=db_dumps/20160812 --gp-r=/gpdata/master/gpseg-1/db_dumps/20160812 --gp-s=p --gp-k=20160812014429 --no-lock --gp-c --no-expand-children "gpadmin"
20160812:01:44:32:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting Dump process
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process returned exit code 0
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Timestamp key = 20160812014429
20160812:01:44:40:324833 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked master status file and master dump file.

Additional Information

Troubleshooting long running backups

Comments

Powered by Zendesk