Greenplum query fails and returns the following errors in the pg_log:
2014-08-04 08:32:33.385395 EDT,"rbrill","prod",p25878,th2078996240,"172.18.110.89","38974",2014-08-04 08:31:24 EDT,60891365,con3422256,cmd438,seg-1,,dx5877165,x60891365,sx1,"ERROR","58M01","Interconnect error writing an outgoing packet: Operation not permitted (seg89 slice1 sdw8:40005 pid=6858)","
So far, there are two types of known issues which can trigger this type of problem:
1. iptables are enabled on segments or master nodes and the buffer in ip_conntrack module is full.
iptables uses connection tracking table to track connections in the ip_conntrack module. ip_conntrack module uses portion of system memory and the memory buffer can be full if the number of tracked connections keeps growing.
To confirm this is the problem, you can verify if the iptables are running:
service iptables status
The following message can also be found in /var/log/messages:
Aug 3 12:53:54 sdw8 kernel: printk: 5 messages suppressed. Aug 3 12:53:54 sdw8 kernel: ip_conntrack: table full, dropping packet. Aug 3 12:53:59 sdw8 kernel: printk: 13 messages suppressed Aug 3 12:53:59 sdw8 kernel: ip_conntrack: table full, dropping packet.
2. Too many connections on the server.
The number of sessions (connection tracking entries that can be handled simultaneously by netfilter in kernel memory) exceeds the limit that is configured in net.ipv4.netfilter.ip_conntrack_max. To confirm that this is the problem, check if the /var/log/message contains the following log messages:
Jul 31 05:21:36 dsshen-seg03 kernel: [52117939.696465] nf_conntrack: table full, dropping packet.class="wysiwyg-color-black wysiwyg-font-size-medium" Jul 31 05:21:36 dsshen-seg03 kernel: [52117939.696527] nf_conntrack: table full, dropping packet. Jul 31 05:21:36 dsshen-seg03 kernel: [52117939.696623] nf_conntrack: table full, dropping packet.
1. If the problem was caused by memory buffer becoming full in ip_conntrack module, you will need to disable iptables on all the cluster nodes. Disabling iptables is a prerequisite for the Greenplum clusters. To do that, run the following commands on the nodes where the iptable is situated:
service iptables stop chkconfig iptables off
2. If the problem was caused by a large number of connections, you can try increase the value of net.ipv4.netfilter.ip_conntrack_max:
sysctl -w net.ip4.netfilter.ip_conntrack_max=655360 //RHEL 5 sysctl -w net.netfilter.nf_conntrack_max=655360 //RHEL 6+
To make the changes permanent, add the following in /etc/sysctl.conf:
net.ip4.netfilter.ip_conntrack_max=655360 //RHEL 5 net.netfilter.nf_conntrack_max=655360 //RHEL 6+
As a general best practice, you shouldn't allow too many connections on a server. A conntrack consumes 400 bytes in the kernel (see /proc/slabinfo), which means tracking 655360 connections would consume 262MB of RAM.