Pivotal Knowledge Base

Follow

vMotion of a VMware vFabric GemFire CacheServer fails

vMotion of a VMware vFabric GemFire CacheServer fails.

Symptoms

  • A vMotion operation of a GemFire CacheServer fails and requires the GemFire server to be rebooted.
  • This issue occurs with ESX 4.1 Update 1 and vCenter Server 4.1 Update 1.

Cause

Here is a high-level overview of the vMotion procedure on vSphere 4.1:
  1. A shadow virtual machine is created on the destination host.
  2. Each memory page is copied from the source to the destination via the vMotion network. This is known as preCopy.
  3. Another pass of the virtual machine memory is performed, copying any pages that changed during the last preCopy iteration.
  4. The iterative memory copying is continued until no changed pages (outstanding to be-copied pages) remain.
  5. The virtual machine is stunned on the source and resumes on the destination.
Assuming that the host can transmit memory pages over the vMotion network faster than the virtual machine can dirty new pages, the iterative copy typically works and no issues occur. Although it is rare, if the virtual machine dirties memory pages faster than vMotion can send them, the preCopy may not be able to converge. 

When the preCopy cannot converge, vMotion decides whether to fail the vMotion or proceed with the switchover to the destination. It makes this decision by estimating the time required to transmit all the remaining outstanding pages. By default, if it will take less than 100 seconds, vMotion proceeds with the switchover. If it will take more than 100 seconds, the vMotion operation times out with no impact on the virtual machine.

Resolution

To certify GemFire for use with vMotion, a dedicated 10GbE vMotion network must be provisioned between ESX/ESXi 4.1 or higher hosts.
 
When relying on a 1GbE vMotion network, or placing both virtual machine traffic and vMotion network traffic on the same 10GbE connection, VMware does not recommend performing vMotion operations to move virtual machines running GemFire CacheServers.

Additionally, when disabling vMotion, you need to disable it for the entire host. For more information, see Disabling VMotion on an ESX/ESXi host (1010376).

Due to a change in the vMotion process, vSphere 5.0 allows for a successful vMotion of a GemFire Server. Source virtual machine updates are performed at a microsecond level, which means that overall performance is higher and net copying that is done over the network is lower and quicker. This is an advantage for GemFire virtual machines with large heaps.

Impact/Risks

Due to the redundancy built into the GemFire application, there is no data loss experienced with this issue.

See Also

©VMware 2013

Comments

Powered by Zendesk