Pivotal Knowledge Base

Follow

Authentication failure for gpmon results in connection error: "Sorry, Too Many Clients Already"

Environment

Product Version
Pivotal Greenplum All versions

Symptom

Modifications to gpmon credentials or the related .pgpass file may lead to a "database unavailable" scenario. 

Client connections to Pivotal Greenplum [GPDB] return with an error: "FATAL","53300","sorry, too many clients already"

While connections on the master instance will show the majority of the connections are in "startup", specifically from the Performance Monitor / Command Center User (gpmon):

# ps -ef | grep startup
gpadmin 40845 35215 0 Apr08 ? 00:00:00 postgres: port 5432, gpmon gpperfmon [local] [local] startup

Cause

The issue may be caused by a reset or modification of the .pgpass file or a modification of the gpmon user password without reflecting the change in the .pgpass file.

The .pgpass file provides the password authentication required for the gpmon to connect to the database.

Logging captured in the $MASTER_DATA_DIRECTORY/pg_log will show that the gpmon user is unable to access the database and connections are setup to timeout.

2015-04-08 17:10:15.673010 EDT,,,p35215,th-1296873696,,,,0,,,seg-1,,,,,"LOG","00000","perfmon process (PID 35224) exited with exit code 1",,,,,,,0,,"postmaster.c",5876,

2015-04-08 17:10:15.682346 EDT,,,p35216,th-1296873696,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
Performance Monitor - There was a problem accessing the gpperfmon database.",,,,,,,,"SysLoggerMain","syslogger.c",552,

2015-04-08 17:11:15.682579 EDT,"gpmon","gpperfmon",p12212,th-1296873696,"[local]",,2015-04-08 17:10:15 EDT,82728,con5441,,seg-1,,,x82728,sx1,"LOG","00000","Issuing cancel signal (SIGINT) to my self (pid = 12212) for statement timeout.",,,,,,,0,,"proc.c",1551,

However, as the gpmon user is scheduled to collect statistics on a more frequent interval than the default client timeout settings. These connections will begin to queue, waiting in startup, until they timeout. Eventually, they may consume all the available connections.

Resolution

Follow the steps:

1. Begin by stopping the Greenplum Performance Monitor:

# gpperfmon --stop

This should allow the connection in startup to terminate without queuing new connections. However in order to set the additional connections to free immediately, the remaining connections in startup can be safely terminated by issuing a kill command:

# ps -C postgres -o pid=,command= | grep startup | awk {'print $1'} | xargs kill

2. Validate that the pg_hba.conf was updated to allow the connection to the database via password and the gpadmin .pgpass file contains an entry for the gpmon user.

# cat $MASTER_DATA_DIRECTORY/pg_hba.conf | grep gpmon
local gpperfmon gpmon md5
host all gpmon 127.0.0.1/28 md5

# cat /home/gpadmin/.pgpass | grep gpmon
*:5432:gpperfmon:gpmon:changeme

If an entry does not exist or is incorrect, modify or append the file with the updated or correct entry (the last entry for a user supersedes all others).

# echo *:5432:gpperfmon:gpmon:changeme >> /home/gpadmin/.pgpass

If any modifications have been made, reload the configuration and test the authentication.

# gpstop -u
   ...

# psql -d gpperfmon -U gpmon -c 'select ctime from system_now limit 1;'
Password for user gpmon:
ctime
---------------------
2015-05-12 07:47:45
(1 row)

3. Once authentication succeeds, you can restart the Performance Monitor. Starting and restarting requires you to specify the gpmon superuser’s password:

$ PGPASSWORD='password1234' gpperfmon --start

Comments

Powered by Zendesk