Pivotal Knowledge Base

Follow

HowTo - Determine the size of the table without bloat

Problem

How to know the actual size of data of a given table without calculating the bloat ?

Solution

In the below query "Expected" refers expected size of the table ( i.e if you dont consider bloat ) and "Reclaimable or claimable" refers to the difference between current size minus expected size.

Query

-- To obtain the expected and reclaimable size of only those tables that has high bloat ( i.e it list only those tables that has the highest bloat on the database ).

SELECT bdinspname||'.'||bdirelname "Relation name",
	bdirelpages::bigint "Curnt Pages",
        round((bdirelpages*33)/1024)::bigint "Curnt Size(MB)",
	bdiexppages::bigint "Exped Pages",
	round((bdiexppages*33)/1024)::bigint "Exped Size(MB)",
	(bdirelpages - bdiexppages)::bigint "Claimble Pages",
	round(((bdirelpages - bdiexppages)*33)/1024)::bigint "Claimble Size(MB)",
	bdidiag "Comments"
FROM gp_toolkit.gp_bloat_diag
ORDER BY 6;

-- To obtain the expected and reclaimable size of all the table in the database

SELECT n.nspname||'.'||c.relname "Relation Name",
	btdrelpages::bigint "Curnt Pages",
        round((btdrelpages*33)/1024)::bigint "Curnt Size(MB)",
	btdexppages::bigint "Exped Pages",
	round((btdexppages*33)/1024)::bigint "Exped Size(MB)",
	(btdrelpages - btdexppages)::bigint "Claimble Pages",
	round(((btdrelpages - btdexppages)*33)/1024)::bigint "Claimble Size(MB)"
FROM gp_toolkit.gp_bloat_expected_pages e, pg_class c, pg_namespace n 
WHERE e.btdrelid=c.oid 
AND n.oid=c.relnamespace 
ORDER BY 6;

NOTE:

  1. The above values are approximate and depends on the stats of the table, analyze the tables to have somewhat closer estimate.
  2. The value of 33 used above is after rounding of 32k (32*1024) and to avoid the integer buffer limit (ERROR: integer out of range) when the pages are too high.

How it works and calculated

Run the gp_toolkit.gp_bloat_diag to know the tables having bloat.

flightdata=# select * from gp_toolkit.gp_bloat_diag;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |              bdidiag
----------+------------+------------+-------------+-------------+------------------------------------
    92783 | public     | test2      |      618870 |       71503 | moderate amount of bloat suspected
(1 row)

Checking the values in the pg_class for confirmation

flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname='test2';
 relname | relpages | reltuples
---------+----------+-----------
 test2   |   618870 |  71000000
(1 row)

So based on the output from gp_bloat_diag the current allocated OS pages is 618870 and the expected pages based on the data in the table should be 71503.

Since one OS page is equal to 32k ( i.e 32 * 1024 bytes ) in GPDB

The current allocated size for the table is 618870 * 32k = 19 GB Approx

And the expected pages would be 71503 * 32k = 2G Approx

Taking it practically, the current size of the table is

flightdata=# SELECT pg_size_pretty(pg_relation_size('test2'));
 pg_size_pretty
----------------
 19 GB
(1 row)

Remove bloat completely ( either by vacuum full or other options indicated in the article ), vacuum full is chosen here

flightdata=# vacuum full test2;
VACUUM

And then analyze

flightdata=# analyze test2;
ANALYZE

The current size of the table is 2G which is approx to the calculation we did above

flightdata=# SELECT pg_size_pretty(pg_relation_size('test2'));
 pg_size_pretty
----------------
 2441 MB
(1 row)

flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname in ('test2');
 relname | relpages | reltuples
---------+----------+-----------
 test2   |    78108 |  71000168
(1 row)

Please note: It is a approx value , but you will get a idea of how much you can reclaim after you redistribute / vacuum the bloated table.

Comments

  • Avatar
    Krzysztof Byszewski

    Also use

    select sum(btdrelpages) AS "ACTUAL", sum(btdexppages) AS "EXPECTED" , sum(btdrelpages) - sum(btdexppages) AS "CLAIMABLE" FROM gp_toolkit.gp_bloat_expected_pages e, pg_class c, pg_namespace n WHERE e.btdrelid=c.oid AND n.oid=c.relnamespace AND n.nspname not in ('pg_catalog', 'gp_toolkit', 'information_schema');

  • Avatar
    Brendan Stephens

    You will need to move the cast to the inside of the expression to avoid "integer out of range" errors...
    eg.
    round((bdirelpages::bigint*33)/1024) "Curnt Size(MB)",

Powered by Zendesk