Pivotal Knowledge Base

Follow

How is the value "skccoeff" under gp_toolkit.gp_skew_coefficients calculated.

Goal

The view gp_toolkit.gp_skew_coefficients is used to show data distribution skew and this article describes how Greenplum calculates the value of the column "skccoeff" on gp_toolkit.gp_skew_coefficients view.

Solution

According to the Greenplum Administrator Guide, the column "skccoeff" is "The coefficient of variation (CV), which is calculated as the standard deviation divided by the average." In addition, the "standard deviation" and "average" are based on (select gp_segment_id,count(*) from group by gp_segment_id).

For example, The "skccoeff" on gp_toolkit.gp_skew_coefficients for testtable is

gpadmin=# Select * from gp_toolkit.gp_skew_coefficients where skcrelname='testtable'. 

skcoid | skcnamespace | skcrelname |       skccoeff
--------+--------------+------------+-----------------------
 252064 | public       | testtable  | 624.49979983984029000

And that SQL that is used to calculate the column "skccoeff" value is

gpadmin=# Select STDDEV(cnt)/avg(cnt)*100
from
   (
     select gp_segment_id segid,count(*) cnt from testtable group by gp_segment_id
     union all
     (
       select tmp2.content segid,0 cnt from
       gp_segment_configuration tmp2
       where tmp2.content not in (select gp_segment_id segid from testtable group by gp_segment_id)
       and tmp2.role='p'
       and tmp2.content<>-1
      )
    ) tmpc;

       ?column?
----------------------
624.4997998398402900

Comments

Powered by Zendesk