Pivotal Knowledge Base

Follow

GemFire cluster hanging when "conserve-sockets=true"

Environment

 Product  Version
 Pivotal GemFire  All Supported Versions
 OS  All Supported OS 

Symptom

GemFire cluster may be in a hanging state when conserve-sockets=true is set up like this with the cluster in a high load situation. When it is hanging, you may see the following symptoms:

A. Thread stack

"ServerConnection on port 12480 Thread 929" tid=0x8fe (in native)
    java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:51)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
	-  locked java.lang.Object@241d89aa
	at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3277)
	-  locked java.lang.Object@7d36d2bc
	at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
	at com.gemstone.gemfire.internal.tcp.MsgStreamer.realFlush(MsgStreamer.java:317)
	at com.gemstone.gemfire.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:245)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:458)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:310)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.send(DirectChannel.java:696)
	at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.directChannelSend(JGroupMembershipManager.java:2844)
	at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.send(JGroupMembershipManager.java:3078)
	at com.gemstone.gemfire.distributed.internal.DistributionChannel.send(DistributionChannel.java:79)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.sendOutgoing(DistributionManager.java:3780)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.sendMessage(DistributionManager.java:3821)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.putOutgoing(DistributionManager.java:1957)
	at com.gemstone.gemfire.internal.cache.partitioned.DestroyMessage.send(DestroyMessage.java:213)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyRemotely(PartitionedRegion.java:5734)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyInBucket(PartitionedRegion.java:5552)
	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.destroyExistingEntry(PartitionedRegionDataView.java:45)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.basicDestroy(PartitionedRegion.java:5419)
	at com.gemstone.gemfire.internal.cache.LocalRegion.validatedDestroy(LocalRegion.java:1143)
	at com.gemstone.gemfire.internal.cache.LocalRegion.destroy(LocalRegion.java:1130)
	at com.gemstone.gemfire.internal.cache.AbstractRegion.destroy(AbstractRegion.java:315)
	at com.gemstone.gemfire.internal.cache.LocalRegion.remove(LocalRegion.java:9362)
......
"ServerConnection on port 12480 Thread 875" tid=0x8c5 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
    java.lang.Thread.State: BLOCKED
	at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
	-  blocked on java.lang.Object@7d36d2bc
	at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
	at com.gemstone.gemfire.internal.tcp.MsgStreamer.realFlush(MsgStreamer.java:317)
	at com.gemstone.gemfire.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:245)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:458)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:310)
	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.send(DirectChannel.java:696)
	at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.directChannelSend(JGroupMembershipManager.java:2844)
	at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.send(JGroupMembershipManager.java:3078)
	at com.gemstone.gemfire.distributed.internal.DistributionChannel.send(DistributionChannel.java:79)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.sendOutgoing(DistributionManager.java:3780)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.sendMessage(DistributionManager.java:3821)
	at com.gemstone.gemfire.distributed.internal.DistributionManager.putOutgoing(DistributionManager.java:1957)
	at com.gemstone.gemfire.internal.cache.partitioned.DestroyMessage.send(DestroyMessage.java:213)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyRemotely(PartitionedRegion.java:5734)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyInBucket(PartitionedRegion.java:5552)
	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.destroyExistingEntry(PartitionedRegionDataView.java:45)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.basicDestroy(PartitionedRegion.java:5419)
	at com.gemstone.gemfire.internal.cache.LocalRegion.validatedDestroy(LocalRegion.java:1143)
	at com.gemstone.gemfire.internal.cache.LocalRegion.destroy(LocalRegion.java:1130)
	at com.gemstone.gemfire.internal.cache.AbstractRegion.destroy(AbstractRegion.java:315)
	at com.gemstone.gemfire.internal.cache.LocalRegion.remove(LocalRegion.java:9362)
......
"ServerConnection on port 12480 Thread 873" tid=0x8c3 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
    java.lang.Thread.State: BLOCKED
	at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
	-  blocked on java.lang.Object@7d36d2bc
	at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
......
"ServerConnection on port 12480 Thread 1394" tid=0xaee owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
    java.lang.Thread.State: BLOCKED
	at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
	-  blocked on java.lang.Object@7d36d2bc
	at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
......
"PartitionedRegion Message Processor105" tid=0x768 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
    java.lang.Thread.State: BLOCKED
	at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
	-  blocked on java.lang.Object@7d36d2bc
	at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
......	

B. The cacheserver log file contains many messages like the ones below:

[warn 2017/03/15 19:38:40.072 CST  tid=0x4f4] 15 seconds have elapsed while waiting for replies: <PutMessage$PutResponse 2569 waiting for 1 replies from [......]

[warn 2017/03/15 19:38:40.072 CST  tid=0x4bc] 15 seconds have elapsed while waiting for replies: <GetMessage$GetResponse 2571 waiting for 1 replies from [......]

[warn 2017/03/15 19:38:41.564 CST  tid=0x5e8] 15 seconds have elapsed while waiting for replies: <com.gemstone.gemfire.internal.cache.PartitionedRegionQueryEvaluator$StreamingQueryPartitionResponse 2588 waiting for 1 replies from [......] 

Cause

From the above thread dump and logging information, we can see that the GemFire cluster is stuck at a synchronization point in Connection.nioWriteFully between peer and peer. This blocking is caused by sharing sockets in the application threads when conserve-sockets=true.

Resolution

Changing the default setting conserve-sockets=true to conserve-sockets=false can prevent this from happening.

 

Comments

Powered by Zendesk