When attempting to bring segments up with gprecoverseg, it fails and gives an error message.
', stderr='/bin/tar: ./pg_log/gpdb-2017-02-17_000000.csv: file changed as we read it.
End of gprecoverseg output:
20170217:10:47:14:155568 gprecoverseg:dc1prsahdmn04:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
Command was: 'ssh -o 'StrictHostKeyChecking no' dc1prsahddn03.es.ad.adp.com ". /usr/local/hawq/./greenplum_path.sh; /bin/tar cvPf /tmp/hawq_template20170217_103318 --exclude=pgsql_tmp -C /hdfs1/primary/gpseg6 ."'
./pg_log/gpdb-2017-02-10_000000.csv ... ...
', stderr='/bin/tar: ./pg_log/gpdb-2017-02-17_000000.csv: file changed as we read it
On HAWQ 1.x, the gprecoverseg builds and packages the DB template from another segment so it can be used to recover the down segment(s). It does not expect any of the files to be updated while it is building it (including the log files). But if a file is updated, then it will fail to complete the template build.
- Go to the segment that is building the template on (gpseg6 in the above example).
- Check the size of the segment directory - "du -sh /hdfs1/primary/gpseg6"
- If it is larger than 500MB, check the log files.
- If there are large log files, move them aside to a location outside the segment directory. DO NOT move the current day log file.
- Once the segment directory size is decreased, try gprecoverseg again. With a smaller template to package, it might complete before the log file is updated again.
- If it fails again, check the log file that is reported in the error and see what is streaming in it.
- Fix the issue that is streaming.