Pivotal Knowledge Base

Follow

FAQ - Greenplum DCA PERC Battery

Environment

  • Greenplum Data Computing Appliance (DCA)
  • Greenplum DCAv1 (DELL servers)

Goal

In this document we will take a look at Greenplum DCA PERC Battery FAQ.

Solution

1. What does PERC stand for?

Dell PowerEdge RAID Controller

2. How does the PERC controller impact the performance of the DCA?

PERC controllers can operate in Write Through and Write Back modes. The default mode is Write Back. The PERC controller impacts performance when it switches from Write Back to Write Through cache. This document describes why the controller switches from Write Back mode to Write Through mode.

3. Why does the PERC controller use Write-Back caching?

Write-Back caching improves performance. In Write-Back caching mode, the controller sends a data transfer completion signal to the host when the controller cache has received all the data in a transaction. The controller then writes the cached data to the storage device in the background.

4. What are the risks of using Write-Back caching?

The risk of using Write-Back cache is that the cached data can be lost if there is a power failure before it is written to the storage device. The battery backup unit (BBU) on the PERC H700 card mitigates this risk and provides hold-up power for a minimum of 24 hours.

5. What is Write-Through caching?

In Write-Through caching, the controller sends a data-transfer completion signal to the host system when the disk subsystem has received all the data in a transaction.

6. When does the controller normally use Write-Through caching?

The controller uses Write-Through caching when the:

  • Battery is missing.
  • Battery charge is low and not capable of holding a charge for 24 hours.
  • Battery charge is low due to a learn cycle.

Write-Back caching is used under all conditions in which the battery is present and in good condition. The default cache setting for virtual disks is Write-Back caching.

7. How does Write-Through caching impact performance?

Write-Through caching degrades write performance to the node, but has no effect on read performance.

8. What is the learn cycle?

The purpose of the learn cycle is to determine the condition of the battery. The learn cycle charges, fully discharges, and then recharges the battery in order to determine the condition and health of the battery. The battery full charge capacity degrades over time and a battery is deemed completely degraded when it can no longer hold a charge for 24 hours and must be replaced.

9. Will the system generate dialhome(s) when the learning cycle starts on the server(s)?

Yes, EMC support will receive one or more of the following dialhome(s) when the learning cycle starts on the server(s).

  • 11.9002 Status of battery on the IO Controller.
  • 12.10002 Virtual Disk X Status: /dev/sdX: nonCritical
  • 12.10005 Write cache policy on virtual disk. For example, expected to be write back mode.
  • 12.10007 Virtual Disk X State: /dev/sdX: Degraded

10. How often does the controller run the learn cycle?

Every 90 days, or upon inserting a new battery into the server.A learn cycle may also be initiated by the controller if the server was powered off for an extended period of time.

11. How long does the learn cycle take?

In the lab, we have seen the learn cycle on a healthy system take between 4-8 hours.

12. Why does the length of a learning cycle vary?

If a charge depleted battery is inserted into the server, the first charging of the battery may take significantly more time to complete. Total learn cycle times up to 16 hours have been observed in the lab in these circumstances.

13. Can you stop the firmware from running the learn cycle?

No, you cannot stop the firmware from running the learn cycle. You can force a learning cycle to start.

14. When does the firmware decide to put the controller in Write-Through mode?

The firmware decides to put the controller in Write-Through mode if the charge on the battery is not sufficient to maintain the cache for 24 hours. The PERC firmware determines if a battery is degraded when it goes blow a certain threshold. The threshold changes with different releases.

15. How many PERC controllers are there in a DCA?

In a full rack running entirely Greenplum Database, there are 18 servers (master, standby master, and segment). Each server has a PERC controller.

16. What PERC problem exists in releases prior to 1.1.0.3?

In releases prior to 1.1.0.3, there was a problem in the PERC controller firmware which caused the battery to not charge properly. This in turn resulted in the controller setting the cache policy to Write-Through. In addition, the learn cycles were not synchronized across all the DCA servers in the cluster. Because of the MPP architecture, one server in Write Through mode could impact the performance of the entire cluster.

17. What does the firmware available with the DCA SW 1.1.0.3 release fix?

The DCA Software Release 1.1.0.3 enables EMC/Greenplum support and services to install new PERC firmware that fixes the problem of PERC batteries not charging up properly. The new firmware fixes two problems. The fix adjusts the battery charge offsets. The fix that sets the trickle charge threshold above the controller cache Write Through threshold that allows the controller to continuously do maintenance charge cycles without falling into Write Through mode. The firmware also sets an early warning threshold saying the battery should be replaced.

Also in the DCA software Release 1.2.1.0, the behavior will change. Here is an excerpt from 1.2.1.0 release notes:

PERC call home behavior

The PERC call home logic is explained in this section. When the PERC battery is unavailable, due to a degraded state (low charge capacity) or learn cycle, there will be a performance impact from a Write-Through cache policy being enabled. Based on the state of the PERC battery and re-occurring battery learn cycle, the following will occur:

Learn cycle has NOT been scheduled through DCA Setup

  • If the PERC battery is in a degraded state, and the PERC controller is NOT in a learn cycle, a warning call home is generated.
  • If the PERC battery is in a degraded state, and the PERC controller is in a learn cycle, no call home is generated.
  • If the PERC cache policy changes to write-through mode, and the PERC is in a learn cycle, no call home is generated.
  • If the PERC cache policy changes to write-through mode, and the PERC is NOT in a learn cycle, a warning call home will be generated.

Learn cycle has been scheduled through DCA Setup

  • If the PERC battery is in a degraded state due to a scheduled learn cycle, no call home is generated.
  • If the DCA software initiates a scheduled learn cycle, battery and cache policy call homes are suppressed for 1 hour. After 1 hour, if the battery is in a degraded state, and the learn cycle is NOT running, a warning call home is generated.

18. What new PERC battery management features are in DCA Software 1.2 Release?

The DCA Software 1.2 release enables EMC Greenplum support and services to manage and automate the scheduling of the learn cycle. (using dca_setup utility to synchronize the learn cycle date/time on all servers).

19. How long do batteries last?

According to the Dell Manual, RAID controller batteries should be replaced every three years.

Comments

Powered by Zendesk