Pivotal Knowledge Base

Follow

Script - Checking Workfile / Spill FIle / Processing Skew Usage

Goal

Is to provide a simple scripts to help administrator identify workfile / spill file or processing skew usage information.

The below scripts need the gp_toolkit schema, if the schema is not avilable please run gptoolkit.sql found at $GPHOME/share/postgresql after connecting to the database.

Note: Please verify the script on a test cluster , before running it on production .

Solution

-- The values of the workfile limits, that control the amount or size of the spill files.

SELECT name, setting
FROM pg_settings 
WHERE name like 'gp_workfile_limit%';

-- Check the process information with the highest spill files or processing skew.

SELECT datname "Database Name",
	procpid "Process ID",
	sess_id "Session ID",
	sum(size)/1024::float "Total Spill Size(KB)",
	sum(numfiles) "Total Spill Files"
FROM  gp_toolkit.gp_workfile_usage_per_query
GROUP BY 1,2,3
ORDER BY 4 DESC;

-- Check the query that is executed by the process (Supply the processID and sessionID that is of interest from the first query)

SELECT distinct datname "Database Name",
	procpid "Process ID",
	sess_id "Session ID",
	substring(current_query from 1 for 300) as "Current Query"
FROM  gp_toolkit.gp_workfile_usage_per_query
WHERE procpid=<Process ID>
AND sess_id=<Session ID>
ORDER BY 2;

-- Check the total workfile files generated and total size per segment by the process (Supply the processID and sessionID that is of interest from the first query)

SELECT datname "Database Name",
	procpid "Process ID",
	sess_id "Session ID",
	segid "Segment number",
	command_cnt "Command ID",
	optype "Opertor Type",
	workmem "Operator Memory(KB)",
	slice "Executing Slice",
	state "Query Status",
	size/1024 "Total Spill Size(KB)" ,
	numfiles "Total Spill Files"
FROM  gp_toolkit.gp_workfile_entries
WHERE procpid=<Process ID>
AND sess_id=<Session ID>
ORDER BY 10 DESC;

-- The location and filename of the workfile per segments for the process (Supply the processID and sessionID that is of interest from the first query)

SELECT a.datname "Database Name",
    a.procpid "Process ID",
    a.sess_id "Session ID",
    a.segid "Segment number",
    b.hostname "Host Name",
    c.fselocation||'/base/'||d.oid||'/'||directory "Workfile Directory"
FROM  gp_toolkit.gp_workfile_entries a , gp_segment_configuration b , pg_filespace_entry c , pg_database d
WHERE c.fsedbid=b.dbid
AND a.segid=b.content
AND a.datname=d.datname
AND b.role='p'
AND procpid=<Process ID>
AND sess_id=<Session ID>
ORDER BY 4;

For more information on the tables or columns used in the query please refer to the documentation

Comments

Powered by Zendesk