Pivotal Knowledge Base

Follow

Troubleshooting the hardware for long running backups/jobs/queries in Pivotal Greenplum

Environment 

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x

Overview

This article explains how to troubleshoot the hardware components for a long running backup or job.

Symptom 

The backup or job is long running; all other areas of the system including the database, OS, and network have been checked and the reports have come out clean.

Checklist

Check Hardware level

  • Make sure there are no hardware-related errors.
  • Review /var/log/messages and dmessages on all segments to make sure that there are no hardware errors. 
  • There should not be any errors in /var/log/messages or /var/log/dmesg.
  • If there is a disk issue, even touching a file on the bad disk will throw an error.
  • For instance, if there are bad network cables, then it would end up in a slow response time even for a simple ping.

Sample output for /var/log/messages

Feb 4 12:52:52 test kernel: Buffer I/O error on device sdc, logical block 268435440
Feb 4 12:52:52 test kernel: end_request: I/O error, dev sde, sector 0
Feb 4 12:52:52 test kernel: end_request: I/O error, dev sdc, sector 2147483640

Sample Output for /var/log/dmesg

hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=2399063, sector=598768 end_request: I/O error, dev 21:05 (hde), sector 598768 raid1: hde5: unrecoverable I/O read error for block 598768

Open ticket with DataDomain Support

If all the above have been verified and documented in ticket, open an SR with Data Domain Support and find the basic information and document in the ticket as listed below:

  1. Does the Data Domain have at least 20% free space?
  2. Are there any alerts on the DDR?
  3. Are there any packet loss reported on the DD side?
  4. Are the # of links from the all the segments to the DD’s configured the same?
  5. Are all the GPDB nodes configured the same to the DD?
  6. How many streams are in use on the DDR (are we using too many?)? 

Related KB Articles

For troubleshooting slow running backups click here.

For troubleshooting slow running restores click here.

For troubleshooting network related issues click here.

 

 

Comments

Powered by Zendesk