Pivotal Knowledge Base

Follow

Script - Detect Orphan Process / Locks

Goal

Orphan process is where there are process (related a gang) that is running on the segments and there is no corresponding process related to the same gang on the master or other segments.

This orphan process can arise when there is query abort , process crash , server / segment crash etc.

How to detect these orphan process on the database or server?

Solution

All the query / commands below sometime might leads to misleading message because when the query is run, this query might be in dispatch phase and not all segments could have started the transaction, so its always good to run the query again after 2 minutes and ensure if the process remains the same or changed.

-- Detect all the orphan process through the unix utility

NOTE: 1. The below query might not help if the orphan process is in master server itself ( i.e the query is still running on master , but no connection on segments ). You can ignore the process that are IDLE (check known issue section below to why)

            2. The script below uses the connections on master as a reference, in case if the connection exits on the master and few segments and not on the rest , the below script might not report it.

source /usr/local/greenplum-db/greenplum_path.sh ; ORPHAN="ps auwxx|grep postgres|grep con[0-9]|`echo "egrep -v \\\"wal|con0|gpsyncagent"`|`psql -R '|' -d template1 -Atc "select 'con'||sess_id::text from pg_stat_activity"`\""; gpssh `psql -R' ' -d template1 -Atc " select distinct '-h ' || hostname from gp_segment_configuration "` $ORPHAN

-- Check from the database end.

NOTE: This works if the process is holding any lock on the table and its detected by pg_locks.

From the below report, total segments should be equal to total sessions, if not then the session is orphan or missing on one more segments

select
   'con'|| a.mppsessionid AS "Session ID",
   b.total_seg as "Total Segments",
   count(a.*) AS "Total Sessions"
from
  (select distinct mppsessionid,gp_segment_id 
   from pg_locks
   where mppsessionid not in (select sess_id from pg_stat_activity where procpid!=pg_backend_pid() OR current_query!='<IDLE>' OR waiting='t')
   and mppsessionid != 0
  ) a,
  (select count(*) as total_seg
   from gp_segment_configuration 
   where role='p'
  ) b
group by 1,2
having count(a.*) < b.total_seg
order by 3;

-- Check if the orphan process is blocking any session / query.

After running the below query note the Blockers SessionID and match with the con<sessionID> with the above query, if its the same then this orphan process is blocking other running queries.

SELECT 
    w.relation::regclass AS "Table",
    w.mode               AS "Waiters Mode",
    w.pid                AS "Waiters PID",
    w.mppsessionid       AS "Waiters SessionID",
    b.mode               AS "Blockers Mode",
    b.pid                AS "Blockers PID",
    b.mppsessionid       AS "Blockers SessionID",
    (select 'Hostname: ' || c.hostname ||' Content: '|| c.content || ' Port: ' || port from gp_segment_configuration c where c.content=b.gp_segment_id and role='p') AS "Blocking Segment"
FROM pg_catalog.pg_locks AS w, pg_catalog.pg_locks AS b 
Where ((w."database" = b."database" AND w.relation  = b.relation)
OR w.transactionid = b.transaction)
AND w.granted='f'
AND b.granted='t'
AND w.mppsessionid <> b.mppsessionid
AND w.mppsessionid in (SELECT l.mppsessionid FROM pg_locks l WHERE l.granted = true AND relation in ( select relation from pg_locks where granted='f'))
AND w.gp_segment_id = b.gp_segment_id
ORDER BY 1;

For more on the database locks, check the scripts on the document mentioned here , you may also look at the article for few more scripts to check in the orphan process.

Known Issue

  • If there is a backup/restore (especially gp_dump/gp_restore) running, then to take of the parallel feature each segments connects independently to its segments and issue COPY FROM/TO, so they are not orphan Process.
  • If the orphan process that is detected on master is IDLE and there is no corresponding process on any segment for that con<sessionID> , its prefectly fine , the segments IDLE process is terminated by the Pivotal Greenplum based on parameter gp_vmem_idle_resource_timeout

Comments

Powered by Zendesk