Pivotal Knowledge Base

Follow

Session failures with "Too many open files"

Symptoms

Errors in database log files similar to:

  • "could not create socket: Too many open files"
  • "could not create temporary file base/19781/pgsql_tmp/workfile_set_HashJoin_Slice-1.XXXXDtKNoF/spillfile_f341:Too many open files"
  • Errors in GPDB utility logs containing "Too many open files"

 

Analysis

"Too many open files" errors happen when a process needs to open more files than it is allowed by the operating system. This number is controlled by the maximum number of file descriptors the process has.

The number of file descriptors for the current process can be shown by the following commands:

[root@mdw ~]# ulimit -a | grep open
open files (-n) 524288
[root@mdw ~]# ulimit -n 524288
[root@mdw ~]#

When a process is created it will inherit the limits from the environment which may be different to the current settings. With regard to GPDB, all database related processes will inherit the limits from the postmaster. The current limits set for the postmaster can be verified in the /proc file system:

[gpadmin@mdw ~]$ ps -ef | grep silent
gpadmin 50746 1 0 Jul25 ? 00:00:00 /usr/local/greenplum-db-4.3.5.2/bin/postgres -D /data/master/gp4352seg-1 -p 5432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E

[gpadmin@mdw ~]$ cat /proc/50746/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 131072 131072 processes Max open files 65536 65536 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 385257 385257 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0

The maximum number of file descriptors is controlled in 2 ways:

1) /etc/security/limits.conf

Specifically the following lines:

* soft nofile 65536
* hard nofile 65536

2) setting explicitly using the ulimit command (which can be done from one of the automatically run scripts on login/etc.)

[root@mdw ~]# ulimit -n 524288
[root@mdw ~]#

For more information about ulimit and number of file descriptors see the ulimit man page and Linux documentation.


DCAv1 originally set the max number of open files per process to 64K (65536). This limit proved to be too low for many of the GPDB workloads, so we started recommending increasing to this value to 256K or 512K.

DCAv2 standardized to 512K and this is the current recommendation.


DCA Upgrade

Unfortunately we found out recently that DCA upgrade does not preserve this setting as it replaces the file (/etc/security/limits.conf) with the original file from the ISO.

 

Solution

If there are problems with "too many open files", check the configured number of file descriptors per process. If the value is lower than 512K (524288), advice the customer to increase the limit. New value can be set in /etc/security/limits.conf (requires OS restart to take effect) or in "gpadmin" account (.bashrc or other automatically executed script).

Running out of maximum File Descriptors for the system

file-max is the maximum File Descriptors (FD) enforced on a kernel level, which cannot be surpassed by all processes. The ulimit is enforced on a process level, which can be less than the file-max. In some scenarios even though the ulimit has been correctly configured, the total # of open files allowed for the entire system might be configured to a value less than the total number of files opened (by all processes). If a process now tries to open file and we hit the maximum allowed for the system we get the below message

Too many open files in system

To fix this check the value for fs.file-max in /etc/sysctl.conf. If configured o a value that is lower than the total # of open files for the entire system at a given point (lsof | wc -l) then we would have increase this. To increase this value follow the below steps

1. Edit the following line in the /etc/sysctl.conf file:

fs.file-max = value

value is the new file descriptor limit that you want to set.

2. Apply the change by running the following command:

# /sbin/sysctl -p

Notes:

  • Before running DCA upgrade on DCAv1, save the contents of /etc/security/limits.conf
  • If there is DCA with max number of open files set to 65K, recommend the customer to change to 512K.

 

Comments

Powered by Zendesk