Pivotal Knowledge Base

Follow

CacheServer hanging and fails to join back when restarting

Environment

Product Version
Pivotal GemFire 8.0.x - 8.2.x
OS All Supported OS

 

Symptom

Gemfire cacheserver may fail to join back to the gemfire cluster in "hanging" status after this node shuts down and restarts.

When this issue happened, you may find a java level deadlock from the thread stack like the below:

Found one Java-level deadlock:
=============================
"Management Task":
  waiting to lock monitor 0x00007f7960005d58 (object 0x00000000d11f7040, a java.lang.Class),
  which is held by "main"
"main":
  waiting to lock monitor 0x00007f79685ea018 (object 0x00000000d13d20d0, a java.lang.Object),
  which is held by "Management Task"

Java stack information for the threads listed above:
===================================================
"Management Task":
	at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:293)
	- waiting to lock  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.internal.cache.tier.InternalBridgeMembership.getClientQueueSizes(InternalBridgeMembership.java:358)
	at com.gemstone.gemfire.management.internal.beans.CacheServerBridge.getNumSubscriptions(CacheServerBridge.java:666)
	at com.gemstone.gemfire.management.internal.beans.CacheServerMBean.getNumSubscriptions(CacheServerMBean.java:288)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.gemstone.gemfire.management.internal.FederationComponent.refreshObjectState(FederationComponent.java:175)
	at com.gemstone.gemfire.management.internal.LocalManager$ManagementTask.run(LocalManager.java:376)
	- locked  (a java.lang.Object)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at com.gemstone.gemfire.management.internal.LocalManager$1$1.run(LocalManager.java:121)
	at java.lang.Thread.run(Thread.java:745)
"main":
	at com.gemstone.gemfire.management.internal.LocalManager.unMarkForFederation(LocalManager.java:243)
	- waiting to lock  (a java.lang.Object)
	at com.gemstone.gemfire.management.internal.LocalManager.cleanUpResources(LocalManager.java:284)
	at com.gemstone.gemfire.management.internal.LocalManager.stopManager(LocalManager.java:441)
	at com.gemstone.gemfire.management.internal.SystemManagementService.close(SystemManagementService.java:261)
	- locked  (a java.util.HashMap)
	at com.gemstone.gemfire.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:775)
	at com.gemstone.gemfire.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:115)
	at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2249)
	at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:505)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1822)
	- locked  (a java.lang.Class for com.gemstone.gemfire.internal.cache.GemFireCacheImpl)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1675)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1671)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:1024)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:682)
	at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:182)
	- locked  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:229)
	- locked  (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
	at com.gemstone.gemfire.distributed.ServerLauncher.startWithGemFireApi(ServerLauncher.java:792)
	at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:694)
	at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:624)
	at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:194)

Found 1 deadlock.

 

Root Cause

1.  Several prerequisites of this issue:

  • The region with index enabled contains data.
  • The cluster configuration service is enabled.

2.  This issue is identified as #GEM-1327

Resolution

1.  You can disable the cluster configuration service to avoid this issue by adding the below parameters to the start scripts of cacheservers and locators:

--J=-Dgemfire.enable-cluster-configuration=false --J=-Dgemfire.use-cluster-configuration=false


2.  Search the release notes to determine when #GEM-1327 has been fixed.  

 

Comments

Powered by Zendesk