Pivotal Knowledge Base

Follow

Segments failing when backing up to a NFS Mount storage

Environment

Product Version
Pivotal Greenplum (GPDB) All Versions

Symptoms

  • Using gpcrondump to backup to an NFS share
  • Multiple segments failing during backup
  • Messages similar to the one below may be found in /var/log/messages:
      Oct 28 04:16:32 sdw9 kernel: INFO: task postgres:10542 blocked for more than 120 seconds.
      Oct 28 04:16:32 sdw9 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Oct 28 04:16:32 sdw9 kernel: postgres D ffffffff80154697 0 10542 8527 10543 10531 (NOTLB)
      Oct 28 04:16:32 sdw9 kernel: ffff81026b88bea8 0000000000000082 00000000ffffffd8 ffffffff80012671
      Oct 28 04:16:32 sdw9 kernel: 8000000ba9937067 000000000000000a ffff810509ea80c0 ffff81018398e080
      Oct 28 04:16:32 sdw9 kernel: 00071725b8ab7c04 000000000004c852 ffff810509ea82a8 000000018002239a
      Oct 28 04:16:32 sdw9 kernel: Call Trace:
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80012671>] may_open+0x65/0x233
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80067225>] do_page_fault+0x4cc/0x842
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80013f1b>] generic_file_llseek+0x2a/0x8b
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff80025788>] sys_lseek+0x40/0x60
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
      Oct 28 04:16:32 sdw9 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83

Cause

Processes can be blocked in I/O wait due to the way that Linux manages its memory and disk writes. By default, the kernel will use up to 40% of its memory to cache disk writes. Once that limit is reached all of the writes to disk would be flushed. There would be problems with the underlying storage layer if it is swamped.

Resolution

For backups please enable direct IO, which will bypass this feature and write synchronously to the NFS mount.

To do so enable the GUC gp_backup_directIO via the following:

gpconfig -c gp_backup_directIO -v on
gpstop -u

The gpconfig command must be followed by the gpstop -u to reload the configuration file.

Comments

Powered by Zendesk