Pivotal Knowledge Base

Follow

Options to gather more information from EXPLAIN ANALYZE

Purpose

EXPLAIN ANALYZE is a useful tool to diagnose potential performance bottlenecks with a given query. However there are several additional options which can provide further information to help diagnose issues.

Additional options

There are two GUCs that can add additional information to the EXPLAIN ANALYZE output. Both of these can be set at the session level and can be used to find skew, excessive processing time or memory consumption:

  • gp_enable_explain_allstat
  • explain_memory_verbosity

We will examine the impact of these two GUCs on the following plan:

mike=# EXPLAIN ANALYZE select * from test1 t1 join test2 t2 on t1.a = t2.b where (t1.a % 7) =0  or (t2.a %3) = 0;
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=212.65..350.03 rows=14 width=16)
   Rows out:  2667 rows at destination with 9.924 ms to first row, 16 ms to end, start offset by 0.419 ms.
   ->  Hash Join  (cost=212.65..350.03 rows=7 width=16)
         Hash Cond: t1.a = t2.b
         Join Filter: (t1.a % 7) = 0 OR (t2.a % 3) = 0
         Rows out:  Avg 1333.5 rows x 2 workers.  Max 1334 rows (seg0) with 9.032 ms to first row, 13 ms to end, start offset by 0.872 ms.
         Executor memory:  55K bytes avg, 79K bytes max (seg1).
         Work_mem used:  55K bytes avg, 79K bytes max (seg1). Workfile: (0 spilling, 0 reused)
         (seg1)   Hash chain length 1111.0 avg, 2000 max, using 3 of 1048589 buckets.
         ->  Seq Scan on test1 t1  (cost=0.00..112.00 rows=5000 width=8)
               Rows out:  Avg 5000.0 rows x 2 workers.  Max 5001 rows (seg0) with 0.043 ms to first row, 0.514 ms to end, start offset by 9.853 ms.
         ->  Hash  (cost=212.40..212.40 rows=10 width=8)
               Rows in:  Avg 2333.5 rows x 2 workers.  Max 3333 rows (seg1) with 2.784 ms to end, start offset by 9.199 ms.
               ->  Redistribute Motion 2:2  (slice1; segments: 2)  (cost=0.00..212.40 rows=10 width=8)
                     Hash Key: t2.b
                     Rows out:  Avg 2333.5 rows x 2 workers at destination.  Max 3333 rows (seg1) with 0.177 ms to first row, 1.715 ms to end, start offset by 9.200 ms.
                     ->  Seq Scan on test2 t2  (cost=0.00..212.00 rows=10 width=8)
                           Filter: (b % 7) = 0 OR (a % 3) = 0
                           Rows out:  Avg 2333.5 rows x 2 workers.  Max 2336 rows (seg1) with 0.191 ms to first row, 1.692 ms to end, start offset by 3.165 ms.
 Slice statistics:
   (slice0)    Executor memory: 203K bytes.
   (slice1)    Executor memory: 199K bytes avg x 2 workers, 199K bytes max (seg0).
   (slice2)    Executor memory: 12675K bytes avg x 2 workers, 12771K bytes max (seg1).  Work_mem: 79K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 20.291 ms
(26 rows)

2.1 gp_enable_explain_allstat

This is a useful way to see if there is any skew during the processings of your query. This will, for each operator, list the individual segment, the number of rows it processed as well as the time it took to do so.

mike=# set gp_enable_explain_allstat=true;
SET
mike=# EXPLAIN ANALYZE select * from test1 t1 join test2 t2 on t1.a = t2.b where (t1.a % 7) =0  or (t2.a %3) = 0;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=212.65..350.03 rows=14 width=16)
   Rows out:  2667 rows at destination with 11 ms to first row, 15 ms to end, start offset by 11 ms.
   ->  Hash Join  (cost=212.65..350.03 rows=7 width=16)
         Hash Cond: t1.a = t2.b
         Join Filter: (t1.a % 7) = 0 OR (t2.a % 3) = 0
         Rows out:  Avg 1333.5 rows x 2 workers.  Max 1334 rows (seg0) with 8.957 ms to first row, 12 ms to end, start offset by 12 ms.
         Executor memory:  55K bytes avg, 79K bytes max (seg1).
         Work_mem used:  55K bytes avg, 79K bytes max (seg1). Workfile: (0 spilling, 0 reused)
         (seg1)   Hash chain length 1111.0 avg, 2000 max, using 3 of 1048589 buckets.
         allstat: seg_firststart_total_ntuples/seg0_12 ms_12 ms_1334/seg1_12 ms_13 ms_1333//end
         ->  Seq Scan on test1 t1  (cost=0.00..112.00 rows=5000 width=8)
               Rows out:  Avg 5000.0 rows x 2 workers.  Max 5001 rows (seg0) with 0.300 ms to first row, 0.813 ms to end, start offset by 20 ms.
               allstat: seg_firststart_total_ntuples/seg0_20 ms_0.813 ms_5001/seg1_21 ms_0.779 ms_4999//end
         ->  Hash  (cost=212.40..212.40 rows=10 width=8)
               Rows in:  Avg 2333.5 rows x 2 workers.  Max 3333 rows (seg1) with 2.283 ms to end, start offset by 19 ms.
               allstat: seg_firststart_total_ntuples/seg0_19 ms_1.028 ms_1334/seg1_19 ms_2.283 ms_3333//end
               ->  Redistribute Motion 2:2  (slice1; segments: 2)  (cost=0.00..212.40 rows=10 width=8)
                     Hash Key: t2.b
                     Rows out:  Avg 2333.5 rows x 2 workers at destination.  Max 3333 rows (seg1) with 0.132 ms to first row, 1.407 ms to end, start offset by 19 ms.
                     allstat: seg_firststart_total_ntuples/seg0_19 ms_0.625 ms_1334/seg1_19 ms_1.407 ms_3333//end
                     ->  Seq Scan on test2 t2  (cost=0.00..212.00 rows=10 width=8)
                           Filter: (b % 7) = 0 OR (a % 3) = 0
                           Rows out:  Avg 2333.5 rows x 2 workers.  Max 2336 rows (seg1) with 0.444 ms to first row, 1.928 ms to end, start offset by 13 ms.
                           allstat: seg_firststart_total_ntuples/seg0_13 ms_2.160 ms_2331/seg1_13 ms_1.928 ms_2336//end
 Slice statistics:
   (slice0)    Executor memory: 203K bytes.
   (slice1)    Executor memory: 199K bytes avg x 2 workers, 199K bytes max (seg0).
   (slice2)    Executor memory: 12675K bytes avg x 2 workers, 12771K bytes max (seg1).  Work_mem: 79K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 25.727 ms
(31 rows)

Below we have one of the lines that the gp_enable_explain_allstat option added to the

allstat: seg_firststart_total_ntuples/seg0_19 ms_1.028 ms_1334/seg1_19 ms_2.283 ms_3333//end

Let's break it down - the format of this is for each segment we have the following: segment number _ time to start ms _ time to complete ms _ number of tuples

Using this we see that:

  • seg0 waiting 19 ms before starting
  • it took 1.028 ms to complete it's work
  • and it processed 1334 tuples

2.2 explain_memory_verbosity

This option uses the new memory accounting framework introduced in GPDB 4.3.2.0 and will show the memory usage per operator node. This GUC takes three values: SUPPRESS, SUMMARY, and DETAIL with SUPPRESS being the default.

Running this with SUMMARY we have the following output:

mike=# set explain_memory_verbosity = SUMMARY;
SET
mike=# EXPLAIN ANALYZE select * from test1 t1 join test2 t2 on t1.a = t2.b where (t1.a % 7) =0  or (t2.a %3) = 0;
                                                                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=212.65..350.03 rows=14 width=16)
   Rows out:  2667 rows at destination with 88 ms to first row, 94 ms to end, start offset by 8.076 ms.
   Memory:  28K bytes.
   ->  Hash Join  (cost=212.65..350.03 rows=7 width=16)
         Hash Cond: t1.a = t2.b
         Join Filter: (t1.a % 7) = 0 OR (t2.a % 3) = 0
         Rows out:  Avg 1333.5 rows x 2 workers.  Max 1334 rows (seg0) with 87 ms to first row, 90 ms to end, start offset by 9.117 ms.
         Executor memory:  55K bytes avg, 79K bytes max (seg1).
         Work_mem used:  55K bytes avg, 79K bytes max (seg1). Workfile: (0 spilling, 0 reused)
         Memory:  10K bytes avg, 10K bytes max (seg0).
         (seg1)   Hash chain length 1111.0 avg, 2000 max, using 3 of 1048589 buckets.
         ->  Seq Scan on test1 t1  (cost=0.00..112.00 rows=5000 width=8)
               Rows out:  Avg 5000.0 rows x 2 workers.  Max 5001 rows (seg0) with 47 ms to end, start offset by 49 ms.
               Memory:  36K bytes avg, 36K bytes max (seg0).
         ->  Hash  (cost=212.40..212.40 rows=10 width=8)
               Rows in:  Avg 2333.5 rows x 2 workers.  Max 3333 rows (seg1) with 34 ms to end, start offset by 16 ms.
               Memory:  12417K bytes avg, 12456K bytes max (seg1).
               ->  Redistribute Motion 2:2  (slice1; segments: 2)  (cost=0.00..212.40 rows=10 width=8)
                     Hash Key: t2.b
                     Rows out:  Avg 2333.5 rows x 2 workers at destination.  Max 3333 rows (seg1) with 29 ms to first row, 33 ms to end, start offset by 16 ms.
                     Memory:  47K bytes avg, 47K bytes max (seg0).
                     ->  Seq Scan on test2 t2  (cost=0.00..212.00 rows=10 width=8)
                           Filter: (b % 7) = 0 OR (a % 3) = 0
                           Rows out:  Avg 2333.5 rows x 2 workers.  Max 2336 rows (seg1) with 37 ms to first row, 38 ms to end, start offset by 9.667 ms.
                           Memory:  73K bytes avg, 73K bytes max (seg0).
 Slice statistics:
   (slice0)    Executor memory: 203K bytes.  Peak memory: 1718K bytes.  Vmem reserved: 2048K bytes.
   (slice1)    Executor memory: 199K bytes avg x 2 workers, 199K bytes max (seg0).  Peak memory: 933K bytes avg x 2 workers, 933K bytes max (seg0).
   (slice2)    Executor memory: 12675K bytes avg x 2 workers, 12771K bytes max (seg1).  Peak memory: 13333K bytes avg x 2 workers, 13372K bytes max (seg1).  Vmem reserved: 12800K bytes avg x 2 workers, 13312K bytes max (seg1).  Work_mem: 79K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 116.285 ms
(32 rows)

With the SUMMARY setting there is only one additional row being added per node, the 'Memory' node:

Memory:  47K bytes avg, 47K bytes max (seg0).

This option will add the average and max memory used for each operator.

The other option, DETAILED works as follows:

mike=# set explain_memory_verbosity = DETAIL;
SET
mike=# EXPLAIN ANALYZE select * from test1 t1 join test2 t2 on t1.a = t2.b where (t1.a % 7) =0  or (t2.a %3) = 0;
                                                                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=212.65..350.03 rows=14 width=16)
   Rows out:  2667 rows at destination with 9.730 ms to first row, 14 ms to end, start offset by 8.050 ms.
   slice 2, seg 0
     Root: Peak 0K bytes. Quota: 0K bytes.
       Top: Peak 0K bytes. Quota: 0K bytes.
         Main: Peak 1K bytes. Quota: 0K bytes.
           Executor: Peak 138K bytes. Quota: 0K bytes.
             X_Motion: Peak 19K bytes. Quota: 100K bytes.
               X_HashJoin: Peak 9K bytes. Quota: 100K bytes.
                 X_TableScan: Peak 35K bytes. Quota: 100K bytes.
                 X_Hash: Peak 12377K bytes. Quota: 127700K bytes.
                   X_Motion: Peak 46K bytes. Quota: 100K bytes.
           Deserializer: Peak 11K bytes. Quota: 0K bytes.
           Deserializer: Peak 1K bytes. Quota: 0K bytes.
         X_Alien: Peak 72K bytes. Quota: 0K bytes.
       MemAcc: Peak 0K bytes. Quota: 0K bytes.
       Rollover: Peak 701K bytes. Quota: 0K bytes.
       SharedHeader: Peak 7K bytes. Quota: 0K bytes.
   slice 2, seg 1
     Root: Peak 0K bytes. Quota: 0K bytes.
       Top: Peak 0K bytes. Quota: 0K bytes.
         Main: Peak 1K bytes. Quota: 0K bytes.
           Executor: Peak 138K bytes. Quota: 0K bytes.
             X_Motion: Peak 19K bytes. Quota: 100K bytes.
               X_HashJoin: Peak 9K bytes. Quota: 100K bytes.
                 X_TableScan: Peak 35K bytes. Quota: 100K bytes.
                 X_Hash: Peak 12455K bytes. Quota: 127700K bytes.
                   X_Motion: Peak 46K bytes. Quota: 100K bytes.
           Deserializer: Peak 11K bytes. Quota: 0K bytes.
           Deserializer: Peak 1K bytes. Quota: 0K bytes.
         X_Alien: Peak 72K bytes. Quota: 0K bytes.
       MemAcc: Peak 0K bytes. Quota: 0K bytes.
       Rollover: Peak 701K bytes. Quota: 0K bytes.
       SharedHeader: Peak 7K bytes. Quota: 0K bytes.
   Memory:  28K bytes.
   ->  Hash Join  (cost=212.65..350.03 rows=7 width=16)
         Hash Cond: t1.a = t2.b
         Join Filter: (t1.a % 7) = 0 OR (t2.a % 3) = 0
         Rows out:  Avg 1333.5 rows x 2 workers.  Max 1334 rows (seg0) with 8.488 ms to first row, 12 ms to end, start offset by 8.977 ms.
         Executor memory:  55K bytes avg, 79K bytes max (seg1).
         Work_mem used:  55K bytes avg, 79K bytes max (seg1). Workfile: (0 spilling, 0 reused)
         Memory:  10K bytes avg, 10K bytes max (seg0).
         (seg1)   Hash chain length 1111.0 avg, 2000 max, using 3 of 1048589 buckets.
         ->  Seq Scan on test1 t1  (cost=0.00..112.00 rows=5000 width=8)
               Rows out:  Avg 5000.0 rows x 2 workers.  Max 5001 rows (seg0) with 0.400 ms to first row, 0.958 ms to end, start offset by 17 ms.
               Memory:  36K bytes avg, 36K bytes max (seg0).
         ->  Hash  (cost=212.40..212.40 rows=10 width=8)
               Rows in:  Avg 2333.5 rows x 2 workers.  Max 3333 rows (seg1) with 2.311 ms to end, start offset by 16 ms.
               Memory:  12417K bytes avg, 12456K bytes max (seg1).
               ->  Redistribute Motion 2:2  (slice1; segments: 2)  (cost=0.00..212.40 rows=10 width=8)
                     Hash Key: t2.b
                     Rows out:  Avg 2333.5 rows x 2 workers at destination.  Max 3333 rows (seg1) with 0.149 ms to first row, 1.428 ms to end, start offset by 16 ms.
                     slice 1, seg 0
                       Root: Peak 0K bytes. Quota: 0K bytes.
                         Top: Peak 0K bytes. Quota: 0K bytes.
                           Main: Peak 2K bytes. Quota: 0K bytes.
                             Executor: Peak 145K bytes. Quota: 0K bytes.
                             Deserializer: Peak 11K bytes. Quota: 0K bytes.
                             Deserializer: Peak 1K bytes. Quota: 0K bytes.
                           X_Alien: Peak 45K bytes. Quota: 0K bytes.
                             X_Motion: Peak 27K bytes. Quota: 100K bytes.
                               X_TableScan: Peak 72K bytes. Quota: 100K bytes.
                         MemAcc: Peak 0K bytes. Quota: 0K bytes.
                         Rollover: Peak 701K bytes. Quota: 0K bytes.
                         SharedHeader: Peak 5K bytes. Quota: 0K bytes.
                     slice 1, seg 1
                       Root: Peak 0K bytes. Quota: 0K bytes.
                         Top: Peak 0K bytes. Quota: 0K bytes.
                           Main: Peak 2K bytes. Quota: 0K bytes.
                             Executor: Peak 145K bytes. Quota: 0K bytes.
                             Deserializer: Peak 11K bytes. Quota: 0K bytes.
                             Deserializer: Peak 1K bytes. Quota: 0K bytes.
                           X_Alien: Peak 45K bytes. Quota: 0K bytes.
                            X_Motion: Peak 27K bytes. Quota: 100K bytes.
                               X_TableScan: Peak 72K bytes. Quota: 100K bytes.
                         MemAcc: Peak 0K bytes. Quota: 0K bytes.
                         Rollover: Peak 701K bytes. Quota: 0K bytes.
                         SharedHeader: Peak 5K bytes. Quota: 0K bytes.
                     Memory:  47K bytes avg, 47K bytes max (seg0).
                     ->  Seq Scan on test2 t2  (cost=0.00..212.00 rows=10 width=8)
                           Filter: (b % 7) = 0 OR (a % 3) = 0
                           Rows out:  Avg 2333.5 rows x 2 workers.  Max 2336 rows (seg1) with 0.384 ms to first row, 1.765 ms to end, start offset by 9.726 ms.
                           Memory:  73K bytes avg, 73K bytes max (seg0).
 Slice statistics:
   (slice0)    Executor memory: 203K bytes.  Peak memory: 1720K bytes.  Vmem reserved: 2048K bytes.
   (slice1)    Executor memory: 199K bytes avg x 2 workers, 199K bytes max (seg0).  Peak memory: 933K bytes avg x 2 workers, 933K bytes max (seg0).
   (slice2)    Executor memory: 12675K bytes avg x 2 workers, 12771K bytes max (seg1).  Peak memory: 13333K bytes avg x 2 workers, 13372K bytes max (seg1).  Vmem reserved: 12800K bytes avg x 2 workers, 13312K bytes max (seg1).  Work_mem: 79K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 22.848 ms
(90 rows)

As you an see this adds substantially more information to the query plan. In this case we're printing the peak memory usage per memory account for each operator. This will let you see the amount of memory being used to de-serialize the plans as well as memory used specifically for things such as table scans and hash joins.

Comments

Powered by Zendesk