Pivotal Knowledge Base

Follow

Pivotal HAWQ: find data files for specific tables

Environment
Pivotal HAWQ all releases.

Summary
HAWQ (like GPDB) uses a catalog structure that comes from PostgreSQL, therefore this applies similarly to HAWQ:

http://www.postgresql.org/docs/8.2/static/storage-file-layout.html

Except that there will be one "PGDATA" for each HAWQ segment on HDFS.

Question
I have a table named example, created like this:

gpadmin=# CREATE TABLE example (id INT) DISTRIBUTED RANDOMLY;
CREATE TABLE

How do I find its data files on the filesystem?

Solution
Check the HAWQ system catalog for information about the table. For example:

gpadmin=# SELECT oid, spcname FROM pg_tablespace;
  oid  |   spcname
-------+-------------
  1663 | pg_default
  1664 | pg_global
 16385 | dfs_default
(3 rows)

gpadmin=# SELECT oid, datname FROM pg_database;
  oid   |   datname
--------+--------------
      1 | template0
  16386 | template1
  16387 | postgres
  16554 | gpadmin

gpadmin=# SELECT c.oid, c.relfilenode, n.nspname, c.relname, c.relpages, c.reltuples, c.relkind, c.relstorage FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE c.relname LIKE 'example%'
gpadmin-#  ORDER BY n.nspname, c.relname;
  oid   | relfilenode | nspname | relname | relpages | reltuples | relkind | relstorage
--------+-------------+---------+---------+----------+-----------+---------+------------
 209443 |      209443 | public  | example |        0 |         0 | r       | a
(1 row)

Example for segment 0:

gpadmin=# SELECT * FROM pg_filespace_entry WHERE fselocation LIKE '%gpseg0';
 fsefsoid | fsedbid |                fselocation
----------+---------+-------------------------------------------
     3052 |       2 | /data1/primary/gpseg0
    16384 |       2 | hdfs://hdm1.dca:8020/hawq_data/gpseg0
(2 rows)

[gpadmin@hdm1 ~]$ sudo -u hdfs hdfs dfs -ls hdfs://hdm1.dca:8020/hawq_data/gpseg0/16385/16554/209443*
Found 1 items
-rw-------   3 gpadmin hadoop          0 2014-12-16 01:57 hdfs://hdm1.dca:8020/hawq_data/gpseg0/16385/16554/209443
[gpadmin@hdm1 ~]$

Notes
There might be other files per segment for one table, depending on the table definition and size (because of column orientation, presence of TOAST tables, etc.).

Please review Pivotal Hadoop Docs related to catalog structure for more details. (in particular pg_class.relfilenode).

Comments

Powered by Zendesk