Pivotal Knowledge Base


Distributed Deadlock when restarting the GemfireXD Servers

Applies to

GemfireXD 1.4.0 to 1.4.x


This document describes workarounds and solution to resolve a distributed deadlock issue during the AsyncEventListener Queue recovery when GemfireXD servers is restarting.


Server1 log snippet:

GemfireXD servers hang and fail to restart with the following logs:

[info 2015/05/26 18:22:56.977 CST <CacheServerLauncher#serverConnector> tid=0xd] Region /AsyncEventQueue_Listener2_SERIAL_GATEWAY_SENDER_QUEUE has potentially stale data. It is waiting for another member to recover the latest data.
 My persistent id:
 DiskStore ID: 33fe3f15-dae9-4433-8297-b2993b492914
 Location: /
 Members with potentially new data:
 DiskStore ID: 3bfecd04-1cad-43a9-9187-68179bd6307e
 Location: /
 Use the "gfxd list-missing-disk-stores" command to see all disk stores that are being waited on by other members.

Server2 log snippet:

[info 2015/05/26 18:22:49.850 CST <CacheServerLauncher#serverConnector> tid=0xd] Region AsyncEventQueue_Listener1_SERIAL_GATEWAY_SENDER_QUEUE requesting initial image from<v2>:26755

[warning 2015/05/26 18:23:04.851 CST <CacheServerLauncher#serverConnector> tid=0xd] 15 seconds have elapsed while waiting for replies: <com.gemstone.gemfire.internal.cache.InitialImageOperation$ImageProcessor 51 waiting for 1 replies from [<v2>:26755]; waiting for 0 messages in-flight; region=/AsyncEventQueue_Listener1_SERIAL_GATEWAY_SENDER_QUEUE; abort=false> on<v3>:5021 whose current membership list is: [[pivhdsne(106355)<v1>:41338,<v2>:26755, pivhdsne(106229)<v0>:38577,<v3>:5021]]

Root Cause

The second node is waiting for asynchronize recovery while iterating, which was causing a distributed deadlock.


  • Warkaround 1: Set system property RECOVER_VALUES_SYNC to TRUE, so that the Asynceventlisteners Queue data recovery are synchronous and in order, i.e.:

  • Workaround 2: Use a single AsyncEventListener instead of multiple AsyncEventListeners.


GemfireXD and above include the fix for this issue:

#52317 Do not wait for async recovery while iterating. It was causing a distributed deadlock.


Powered by Zendesk