This article will assist you to locate the logs which are required while troubleshooting a HDB related issues. Let's quickly list out the different logs available which are very helpful for troubleshooting.
There are primarily 5 logs available:
1. HDB Master Logs - Entry to the HDB database users from client perspective is via the HDB master. All the activities related to HDB access, query processing or related operations is logged in HDB master logs.
2. HDB Segment Logs - HDB Master dispatches query execution task to its segments. Each segment is a database in itself and logs related to any operation on the segment database are tracked under HDB segment logs.
There is a correlation between HDB master logs and HDB segment logs. If a query is triggered on the HDB master, and if logging is enabled at segment, master and segment will store information for query processing. Often, there are events which are specific to the segment executing the query and logs for it will be written only in HDB segment logs but can be easily correlated using a unique identified "session id".
Every session to the HDB database is assigned a unique session id, thus any query run in that session will use the session id as it's identifier and master and segment logs can be correlated using it. Ex: con9866 can be used to identify related activity on the segments.
2014-07-15 09:40:22.899924 PDT,"gpadmin","gpadmin",p91220,th699058016,"[local]",,2014-07-15 09:38:35 PDT,33851,con9866,cmd16,seg-1,,,x33851,sx1,"LOG","00000","2014-07-15 09:40:22:898757 PDT,THD000,TRACE,""CAutoTrace.cpp:53: [OPT]: Search terminated at stage 1/1"",",,,,,,"select gp.hostname, pe.fselocation from pg_filespace_entry pe, gp_segment_configuration gp where pe.fsedbid = gp.dbid and fselocation !~ 'hdfs://';",0,,"COptTasks.cpp",463,
3. Namenode Logs - HDB is a MPP database engine which works on hadoop, thus namenode is a critical entity in HDB ecosystem. When there is a request to execute a HDFS operation, HDB segments connect to the namenode to retrieve the location of the files on HDFS and use the retrieved information for further processing.
4. Datanode Logs - Datanode are the actual daemons which store HDB user tables data. If there is a failure during an HDFS operation, HDB queries will fail, thus datanodes logs helps identifying critical information during query execution failure.
5. HDB Administration logs - Any maintenance operation performed using HDB management utilities including on HDB database is tracked under these logs. Ex: start, stop, etc.
How to locate the logs:
HDB master logs.
1. Go the HDB master. (ssh <hawq_master>) 2. Verify if HDB master process is running. [gpadmin@hawq-master ~]$ ps -ef | egrep silent | egrep master
gpadmin 480392 1 0 Jul04 ? 00:00:05 /usr/local/hawq-220.127.116.11/bin/postgres -D /data1/master/gpseg-1 -p 5432 -b 1 -z 6 --silent-mode=true -i -M master -C -1 -x 8 -E 3. The value for option -D in above example is "/data1/master/gpseg-1" which is known as the master data directory for HDB. Often the term, MASTER_DATA_DIRECTORY is used to reference it. 4. Based on the above identified MASTER_DATA_DIRECTORY, HDB Master Logs can be located under: "/data1/master/gpseg-1/pg_log Or you can login to the database and run the below command: [gpadmin@hdw3 pg_log]$ psql gpadmin=# show data_directory ;
(1 row) 5. Based on the date of the issue identified, HDB log files can be easily located. HDB logs files have a format gpdb-YYYY-MM-DD_<timestamp>.log which easily identifies the log file dates.
HDB segment logs:
1. Login to HDB database (psql <databasename>). Catalog files to locate HDB segment directories are common to all the database, so you can login to any database. 2. Below sql can be used to identify the hostname, status of the segment, content id (an identifier for the segment) and location gpadmin=# select gp.hostname,gp.status,gp.content, pe.fselocation from pg_filespace_entry pe, gp_segment_configuration gp where pe.fsedbid = gp.dbid and fselocation !~ 'dfs';
hostname | status | content | fselocation
hdw3.phd.local | u | -1 | /data1/master/gpseg-1
hdw1.phd.local | u | 2 | /data1/primary/gpseg2
pccadmin.phd.local | u | -1 | /data1/master/gpseg-1 Translating above into a statement: Log for content id 2 are located on host hdw1.phd.local under /data1/primary/gpseg2/pg_log directory, currently the segment is marked up.
Note: In the above example, content id = -1 is repeated 2 times, they both represent HDB master (active and standby). But, you can only login to the database using active master, which helps you to identify the currently active master.
You can use PCC UI -> Topology tab to identify the namenode and datanode hostname. On namenode and datanode, under /etc/default you can find respective file names which holds parameters to specify log location for a PHD cluster. If your cluster is secured, use the location specified by HADOOP_SECURE_DN_LOG_DIR for datanodes.
[gpadmin@hdm1 ~]$ egrep LOG /etc/default/hadoop-hdfs-namenode export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs
[gpadmin@hdw3 pg_log]$ egrep LOG /etc/default/hadoop-hdfs-datanode export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs # export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR/hdfs
HDB administration logs
These logs are available under on the HDB master and segment directories and the name of the log depends on the type of operation performed. /home/gpadmin/gpAdminLogs