Pivotal Knowledge Base

Follow

Don´t use the deprecated cacheserver script with GemFire 7 and later

Applies to

GemFire 7 and later

Purpose

Explain the issues that can arise from using the now deprecated cacheserver script that is still present in the GemFire bin directory.

Description

If the cacheserver script is used to start GemFire it can be seen in the server logs:

sun.java.command = com.gemstone.gemfire.internal.cache.CacheServerLauncher server -disable-default-server -server-port=3106 archive-disk-space-limit=5000 archive-file-size-limit=25 statistic-sampling-enabled=true statistic-sample-rate=1000 socket-buffer-size=256144 log-level=config log-file=//v/campus/ny/appl/settlements/safepublisher/data/dev/logs/gemfire/cacheserver.iapp165.devin1.ms.com.3106.log log-file-size-limit=25 log-disk-space-limit=5000 statistic-archive-file=/var/tmp/gemfire/dev-poc/safepub2/cacheserver.iapp165.devin1.ms.com.3106/stats/cacheserver.iapp165.devin1.ms.com.3106.gfs cache-xml-file=/var/tmp/gemfire/dev-poc/safepub2/cacheserver.iapp165.devin1.ms.com.3106/etc/cache.xml conserve-sockets=false

Start up looks to be working normally but when stopping the member it can get into a java-level deadlock situation as shown in the following stack:

"main" prio=10 tid=0x00002af748019800 nid=0x137d waiting for monitor entry [0x00002af746219000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.gemstone.gemfire.management.internal.LocalManager.unMarkForFederation(LocalManager.java:238)
- waiting to lock <0x000000077fff1bc0> (a java.lang.Object)
at com.gemstone.gemfire.management.internal.LocalManager.cleanUpResources(LocalManager.java:279)
at com.gemstone.gemfire.management.internal.LocalManager.stopManager(LocalManager.java:440)
at com.gemstone.gemfire.management.internal.SystemManagementService.close(SystemManagementService.java:252)
- locked <0x000000077ffcbab0> (a java.util.HashMap)
at com.gemstone.gemfire.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:804)
at com.gemstone.gemfire.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:115)
at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2099)
at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:426)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1597)
- locked <0x0000000780003660> (a java.lang.Class for com.gemstone.gemfire.internal.cache.GemFireCacheImpl)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1452)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1448)
at org.springframework.data.gemfire.CacheFactoryBean.destroy(CacheFactoryBean.java:419)
at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:257)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:540)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:516)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:824)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:485)
at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:921)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:490)
- locked <0x0000000780003a68> (a java.lang.Object)
at org.springframework.data.gemfire.support.SpringContextBootstrappingInitializer.refreshApplicationContext(SpringContextBootstrappingInitializer.java:238)
at org.springframework.data.gemfire.support.SpringContextBootstrappingInitializer.init(SpringContextBootstrappingInitializer.java:281)
- locked <0x0000000780003ab8> (a java.lang.Class for org.springframework.data.gemfire.support.SpringContextBootstrappingInitializer)
at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.runInitializer(CacheCreation.java:1371)
at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:564)
at com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:293)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:3899)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:947)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:837)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:625)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:182)
- locked <0x000000077fff19e0> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:161)
- locked <0x000000077fff19e0> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
at com.gemstone.gemfire.internal.cache.CacheServerLauncher.createCache(CacheServerLauncher.java:714)
at com.gemstone.gemfire.internal.cache.CacheServerLauncher.server(CacheServerLauncher.java:617)
at com.gemstone.gemfire.internal.cache.CacheServerLauncher.main(CacheServerLauncher.java:189)

Then, notice the state of the "Management Task" thread:

"Management Task" daemon prio=10 tid=0x00002af748c16800 nid=0x13c6 waiting for monitor entry [0x00002af804976000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:293)
- waiting to lock <0x000000077fff19e0> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
at com.gemstone.gemfire.internal.cache.tier.InternalBridgeMembership.getClientQueueSizes(InternalBridgeMembership.java:291)
at com.gemstone.gemfire.management.internal.beans.CacheServerBridge.getNumSubscriptions(CacheServerBridge.java:606)
at com.gemstone.gemfire.management.internal.beans.CacheServerMBean.getNumSubscriptions(CacheServerMBean.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.gemstone.gemfire.management.internal.FederationComponent.refreshObjectState(FederationComponent.java:169)
at com.gemstone.gemfire.management.internal.LocalManager$ManagementTask.run(LocalManager.java:376)
- locked <0x000000077fff1bc0> (a java.lang.Object)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at com.gemstone.gemfire.management.internal.LocalManager$1$1.run(LocalManager.java:115)
at java.lang.Thread.run(Thread.java:745)


You can see that the "main" Thread is waiting to hold a lock (0x000000077fff1bc0) held by the "Management Task" Thread, which in turn is waiting to hold a lock (0x000000077fff19e0) held by the "main" Thread, a classic Java-level deadlock situation.

This is also shown in the Thread stack dump:

Found one Java-level deadlock:
=============================
"Pooled High Priority Message Processor 8":
waiting to lock monitor 0x00002af890016688 (object 0x000000077ffcbab0, a java.util.HashMap),
which is held by "main"
"main":
waiting to lock monitor 0x00002af890016478 (object 0x000000077fff1bc0, a java.lang.Object),
which is held by "Management Task"
"Management Task":
waiting to lock monitor 0x00002af7bc003e58 (object 0x000000077fff19e0, a java.lang.Class),

In this case the "main" was called from Spring so Spring must initiate the appropriate interactions (with the GemFire API) to stop it during shutdown which ends in the deadlock. As you can see from the "main" Thread stack dump, the SDG CacheFactoryBean.destroy() method called GemFireCacheImpl.close() during shutdown, which is the appropriate interaction when using the SpringContextBootstrappingInitializer.

Solution

Instead of using the 'cacheserver' shell script, you need to be using Gfsh (whether "interactive" or "non-interactive", a.k.a. "scripted", e.g. in a OS shell script, like Bash, using gfsh -e "..." style commands and so on).

the problem is, since Gfsh (which, by the way, appropriately uses the ServerLauncher class to fork the JVM process running GemFire/Spring) was not used to start/stop the Server, the Management code does not have the appropriate "hooks" into the Server to control the GemFire JVM process. The GemFire Management interface (technically, the MBeans) use the ServerLauncher API to "assess" the state of the Server and to stop it.

If the Server was launched with the 'cacheserver' shell script, then the ServerLauncher class will not be used (the old CacheServerLauncher class is used instead), and the GemFire Server will not be able to be stopped reliably, regardless of whether Spring is involved or not.

In short, do not use the 'cacheserver' shell script to start/stop GemFire processes (Locators/Servers); use Gfsh.

Comments

Powered by Zendesk