Pivotal Knowledge Base

Follow

Greenplum Error: "Failed to Acquire Resources on One or More Segments"

Environment

 Product  Version
 Pivotal Greenplum  4.3.x
 OS  RHEL 6.x

Symptom

Your query fails and is instead returning the following error message: "failed to acquire resources on one or more segments."

Cause

The error message is a generic message and can be because of various resource constraints.

In order to find the root cause, check the segment log for the segment which fails to acquire resources.

Below are some possible causes:

  • GPHOME is owned by root on some segment servers.
  • Database parameters "max_connections" and "max_prepared_transactions" are set improperly. This is accompanied with the following error in primary segment logs: "FATAL","53300","sorry, too many clients already.
  • The disk is 100% full on one or more segments.
  • Any other resource constraint indicated by the segment log.

Resolution

  • Check $GPHOME on every segment server to make sure it is owned by gpadmin.

If not, run below command to change it:

chown -R gpadmin:gpadmin <$GPHOME>
  • According to the Greenplum Admin Guide:

max_connections:

The maximum number of concurrent connections to the database server. In a Greenplum system, user client connections go through the Greenplum master instance only. Segment instances should allow 5-10 times the amount as the master. When you increase this parameter, max_prepared_transactions must be increased as well. For more information about limiting concurrent connections, refer to the Greenplum Database Administrator Guide. Increasing this parameter may cause Greenplum to request more shared memory.

max_prepared_transactions:

Sets the maximum number of transactions that can be in the prepared state simultaneously. Greenplum uses prepared transactions internally to ensure data integrity across the segments. This value must be at least as large as the value of max_connections on the master. Segment instances should be set to the same value as the master.

For example (if max_connections are set to 100 on master):

gpconfig -c max_prepared_transactions -v 100
gpconfig -c max_connections -v 500 -m 100

And then, restart Greenplum.

  • Clean the disk space on segments.
  • Check any other resource constraint on segments, e.g., memory, CPU, directory permission, etc.

Comments

  • Avatar
    Sathesh Sundaram

    Hi,

    Will the master log show which segment log to check? Looks like it soesn't give any indication on which segment to check.

    con54312,cmd42,seg-1,,dx121824,x81294443,sx1,"ERROR","XX000","could not temporarily connect to one or more segments (cdbgang.c:1633)

    Thanks,
    Sathesh

  • Avatar
    Faisal Ali

    Hi Sathesh,

    Yes if the segment becomes unreachable or unresponsive the error do print out the segments information.

    The above error you pasted is generic and can be result of many reasons as pointed in the article , if none of the above said points matches to your issue seen on your environment then it could also be when you send out the query to the database and the master (QD) open a transaction on the segments (QE) for some reason the QE couldn't be started maybe due to library call fail , packet loss etc, then you would encounter the same error as you pasted ( which i guess what happen in your case ) , in this case greenplum then retries the connection again with resetting the connection identifier and you will see the message such as below.

    2015-12-25 15:57:47.864364 EST,"xxxx","xxxx",p54442,th-691542240,"10.13.84.20","3598",2015-12-25 15:57:01 EST,0,con723718,,seg-1,,,,,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 723695). The new session id = 723718",,,,,,,0,,"cdbgang.c",2657,
    

    stating that query would now be run under sess_id = 723718 , the previous sess_id = 723695 has been reset due to failure in setting up gang on some segments ( this doesn't mean segments went down or unreachable , just it couldnt setup a gang )

    This issue is not a point of concern if it happen once in a while, if it happens quite often set the parameter gp_log_gang = DEBUG to understand more information.

    If the server / segment is unresponsive or unreachable. You may check for errors like below if the server is unreachable, if yes then the segment information should be printed.

    grep -i dispatcher <master-log>
    grep -i "lost connection" <master-log>
    
  • Avatar
    Gurupreet Singh Bhatia

    Hi Faisal,
    I got same message in log

    2016-10-04 02:10:21.999925 EDT,"etluser","mydb",p757573,th1440676864,"10.99.127.1","62102",2016-10-04 02:08:20 EDT,0,con1673764,cmd36,seg-1,,,,,"LOG","00000","lost connection with segworker group member",,,,,,,0,,"cdbgang.c",2060,
    2016-10-04 02:10:22.059810 EDT,"etluser","mydb",p757573,th1440676864,"10.99.127.1","62102",2016-10-04 02:08:20 EDT,0,con1673764,cmd36,seg-1,,,,,"ERROR","XX000","could not temporarily connect to one or more segments (cdbgang.c:2488)",,,,,,,0,,"cdbgang.c",2488,"Stack trace:
    2016-10-04 02:10:22.060047 EDT,"etluser","mydb",p757573,th1440676864,"10.99.127.1","62102",2016-10-04 02:08:20 EDT,0,con1673961,,seg-1,,,,,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 1673764). The new session id = 1673961",,,,,,,0,,"cdbgang.c",2712,

    I didnt find any issue in DB, also application team didnt report any error yet. Will any data loss here?
    What can be the reason for this messages? What should i need to check? and what precautions required for future?

    Thanks

  • Avatar
    Faisal Ali

    Hi Gurupreet,

    Please check my earlier comment and hopefully that should helps.

    No there is no data loss here , the transaction that is inflight during the time of the segment failure will be rollbacked.

    Thanks
    Faisal

    Edited by Faisal Ali
Powered by Zendesk