Pivotal Knowledge Base

Follow

CacheServer hangs in Java level deadlock when restarting

Environment

Product Version
Pivotal GemFire 8.0.x - 8.2.6
OS All Supported OS

 

Symptom

A GemFire cacheserver may fail to join back to the gemfire cluster, blocked in a "hanging" state, after the cacheserver node shuts down and restarts.

When this issue happens, you may find a java level deadlock from the thread stack like the one shown below:

Found one Java-level deadlock:
=============================
"Management Task":
  waiting to lock monitor 0x00007f7960005d58 (object 0x00000000d11f7040, a java.lang.Class),
  which is held by "main"
"main":
  waiting to lock monitor 0x00007f79685ea018 (object 0x00000000d13d20d0, a java.lang.Object),
  which is held by "Management Task"

Java stack information for the threads listed above:
===================================================
"Management Task":
	at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:293)
	- waiting to lock  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.internal.cache.tier.InternalBridgeMembership.getClientQueueSizes(InternalBridgeMembership.java:358)
	at com.gemstone.gemfire.management.internal.beans.CacheServerBridge.getNumSubscriptions(CacheServerBridge.java:666)
	at com.gemstone.gemfire.management.internal.beans.CacheServerMBean.getNumSubscriptions(CacheServerMBean.java:288)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.gemstone.gemfire.management.internal.FederationComponent.refreshObjectState(FederationComponent.java:175)
	at com.gemstone.gemfire.management.internal.LocalManager$ManagementTask.run(LocalManager.java:376)
	- locked  (a java.lang.Object)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at com.gemstone.gemfire.management.internal.LocalManager$1$1.run(LocalManager.java:121)
	at java.lang.Thread.run(Thread.java:745)
"main":
	at com.gemstone.gemfire.management.internal.LocalManager.unMarkForFederation(LocalManager.java:243)
	- waiting to lock  (a java.lang.Object)
	at com.gemstone.gemfire.management.internal.LocalManager.cleanUpResources(LocalManager.java:284)
	at com.gemstone.gemfire.management.internal.LocalManager.stopManager(LocalManager.java:441)
	at com.gemstone.gemfire.management.internal.SystemManagementService.close(SystemManagementService.java:261)
	- locked  (a java.util.HashMap)
	at com.gemstone.gemfire.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:775)
	at com.gemstone.gemfire.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:115)
	at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2249)
	at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:505)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1822)
	- locked  (a java.lang.Class for com.gemstone.gemfire.internal.cache.GemFireCacheImpl)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1675)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1671)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:1024)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:682)
	at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:182)
	- locked  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:229)
	- locked  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.distributed.ServerLauncher.startWithGemFireApi(ServerLauncher.java:792)
	at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:694)
	at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:624)
	at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:194)

Found 1 deadlock.

 

Root Cause

1.  Several prerequisites of this issue:

  • The region with index enabled contains data.
  • The cluster configuration service is enabled.

2.  This issue is identified as #GEM-1327 and #GEM-1256.

Resolution

1.  You can disable the cluster configuration service to avoid this issue by adding the below parameters to the start scripts of cacheservers and locators:

--J=-Dgemfire.enable-cluster-configuration=false --J=-Dgemfire.use-cluster-configuration=false


2.  Search the release notes to determine when #GEM-1327 has been fixed.   The fix will be included in 8.2.7, 9.1.1, and 9.2 going forward.

 

Comments

Powered by Zendesk