Pivotal Knowledge Base

Follow

Performance degradation with gp_interconnect_type=UDPIFC especially on large cluster

Environment

GPDB version 4.2.4.0 and above.

Problem

Performance degradation is seen more than 2x times when compared to the UDP on a 9 rack cluster when gp_interconnect_type is set to UDPIFC

Also, the kernel parameters for network socket buffers are set to default values (run as root user):

# sysctl -a|grep net.core|grep mem|grep max|grep -v opt
net.core.wmem_max =131071
net.core.rmem_max =131071

Cause

UDPIFC protocol has introduced new flow control mechanism in case of retransmitting packages, which includes the sender will wait until the receiver has free buffers before resend the lost packages.

In case of network socket buffer setting too small and the cluster is large, there is good chance that the package would be dropped due to this buffer overflow. The number of "packet receive errors" from "netstat -su"  shows the rough number of packets dropped due to UDP socket buffer overflow.

UDPIFC would wait longer than UDP when retransmitting these lost packages, and this is why performance of UDPIFC will be degraded compared to UDP in such situation.

Solution

From our test, good values for the network socket buffer is 2MB. Set these 2 parameters on the master and segments nodes to 2MB would resolve the performance degradation with UDPIFC.

Set these parameters with following commands (run all as root user):

# gpssh -f /home/gpadmin/hosts.all 'echo 2097152 > /proc/sys/net/core/rmem_max'
# gpssh -f /home/gpadmin/hosts.all 'echo 2097152 > /proc/sys/net/core/wmem_max'

Verify the change with following command:

# gpssh -f /home/gpadmin/hosts.all 'sysctl -a|grep net.core|grep mem|grep max|grep -v opt'

Append the following lines to /etc/sysctl.conf on all master and segment hosts to make the change permanent:

net.core.wmem_max = 2097152
net.core.rmem_max = 2097152

Comments

Powered by Zendesk