Pivotal Knowledge Base

Follow

Determining Query Tables Requiring Analyze (gpmt gpstatscheck)

Problem

Query is running slower than expected.

One issue can be outdated or invalid statistics on the tables involved in the query. This can happen if new data has been loaded in to the table but an ANALYZE was never executed. The database will use wrong statistics when generating the query plan.

Solution

The gpstatscheck utility in GPMT can be used to verify that all tables involved in a query have optimal statistics.

The options that are available with the utility.

[gpadmin@localhost ~]$ ./gpmt gpstatscheck --help

USAGE:
gpmt gpstatscheck -f QUERYFILE
[ -p PORT ] [ -d DATABASE ]

DESCRIPTION:
gpstatscheck idendtifies tables with outdated statistics.
It performs the following actions:
* Accepts an input query file of the form: EXPLAIN VERBOSE <QUERY>.
* Execute the input file and scan EXPLAIN VERBOSE plan to identify the scanned tables.
* Execute a query to get count(*) and pg_class.reltuples for each table.
* Check if difference between count and reltuples is above the defined threshold 5.0%.
* Generate an SQL file in current directory containing the ANALYZE commands.

OPTIONS:
-f Query file
-p Port (defaults to 5432)
-d Database (defaults to gpadmin)

EXAMPLES:
Execute query1.sql and check for missing stats
gpmt gpstatscheck -f query1.sql

Example

    1. Create a file containing "EXPLAIN VERBOSE <problematic query>":
      $ cat /tmp/myquery.sql 
      EXPLAIN VERBOSE
      SELECT * FROM mytable, nums, table1;
    2. Run gpstatscheck:
      $ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres
      Executing EXPLAIN VERBOSE query.
      Found 3 tables in query.

      Table Details
      -----------------------------------------
      | Table Name | Info |
      -----------------+-----------------------
      | public.nums | |
      | public.table1 | |
      | public.mytable | |
      -----------------+-----------------------

      Note: Views and External Tables will be skipped since they do not have statistics.
      Note: Top Level Partitions will be skipped since Legacy Query Optimizer does not use the statistics.

      OK to execute "SELECT count(*)" on tables listed above? Yy|Nn: y

      Executing count(*) to get actual tuple counts:
      -> public.nums ... done
      -> public.table1 ... done
      -> public.mytable ... done

      Stats Check Summary
      -------------------------------------------------------------------------------------------
      | Table Name | Actual | Estimated | Diff | Comments |
      -----------------+------------------+------------------+-----------+-----------------------
      | public.nums | 1100 | 100 | 1000 | Needs ANALYZE |
      | public.table1 | 10000 | 10000 | 0 | OK |
      | public.mytable | 0 | 0 | 0 | OK |
      -----------------+------------------+------------------+-----------+-----------------------

      Generating ANALYZE commands.

      Output file:
      gpstatscheck_20160926_134946.sql

      Execute using:
      psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql

      Execution finished successfully!
    3. If invalid statistics are detected the tool will generate an ANALYZE script shown above:
      $ cat gpstatscheck_20160926_134946.sql
      ANALYZE public.nums;
    4. Execute the ANALYZE script:
      psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql
    5. Now run gpstatscheck again and all the tables should show OK:
      $ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres
      ------------------------------------------------------------------------------------------- | Table Name | Actual | Estimated | Diff | Comments | -----------------+------------------+------------------+-----------+----------------------- | public.nums | 1100 | 1100 | 0 | OK | | public.table1 | 10000 | 10000 | 0 | OK | | public.mytable | 0 | 0 | 0 | OK | -----------------+------------------+------------------+-----------+-----------------------

 

Comments

Powered by Zendesk