Pivotal Knowledge Base

Follow

HowTo - Check data skew across segments on v4.x

Problem

How to check data skew across segments on v4.x

Solution

In Greenplum v4.x the Administrative Schema gp_toolkit can be used to check for data skew. The gp_toolkit schema contains a number of views that you can access using SQL commands. For convenience, you may want to add the gp_toolkit schema to your schema search path. For example:

ALTER ROLE myrole SET search_path TO myschema,gp_toolkit;

The following views can help diagnose if a table has uneven data distribution(Data Skew): gp_skew_coefficients This view shows data distribution skew by calculating the coefficient of variation (CV) for the data stored on each segment. The lower the value, the better.

SELECT * FROM gp_toolkit.gp_skew_coefficients;

The view "gp_skew_idle_fractions" shows data distribution skew by calculating the percentage of the system that is idle during a table scan, which is an indicator of processing data skew.

SELECT * FROM gp_toolkit.gp_skew_idle_fractions

Once you have the table names with the high skccoeff values from the gp_skew_coefficients query, you can then check the data distribution across the Segment servers. Please refer to article 

Comments

Powered by Zendesk