Pivotal Knowledge Base

Follow

How to Change Hadoop Daemon log4j.properties

Environment

Product Version
PHD 1.x
PHD 2.0.0.0

Purpose

By default, PHD service daemons use various log4j file appenders and some do not allow the user to control how much data is generated by Hadoop daemon logs. The sysadmin must manage and maintain the generated log data. This article explains how to configure log4j.properties for all PHD core components to help sysadmins control and understand the PHD daemon log management.  

Refer to the following Java Docs for log4j and sample configuration params quick reference.

DailyRollingFileAppender_DRFA
#
# Daily Rolling File Appender
# Rollover at midnight
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
RollingFileAppender_RFA
#
# Rolling File Appender - cap space usage at 256mb.
#
hadoop.log.maxfilesize=256MB
hadoop.log.maxbackupindex=20
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
FileAppender
#
# File Appender
#
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.File=${hive.log.dir}/${hive.log.file}
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
ConsoleAppender
#
# console appender options
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.appender.console.encoding=UTF-8

Refer below to the Root logger environmental variables quick reference.

Variable Services Where to Override
HADOOP_ROOT_LOGGER=INFO,RFA Namenode
Journalnode
ZKFC
Datanode
Secondary Namenode
/etc/gphd/hadoop/conf/hadoop-env.sh
HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA     MapReduce History Server     /etc/gphd/hadoop/conf/mapred-env.sh
YARN_ROOT_LOGGER=INFO,RFA

Resourcemanager
Nodemanager

/etc/gphd/hadoop/conf/yarn-env.sh
 ZOO_LOG4J_PROP=INFO,ROLLINGFILE  zookeeper /etc/gphd/zookeeper/conf/java.env
 HBASE_ROOT_LOGGER=INFO,RFA

hbase master
HBase regionserver

 /etc/gphd/hbase/conf/hbase-env.sh

Where is the log directory defined?

All services will have the $<SERVICE>_LOG_DIR variable defined in /etc/default/<service>. For example, the datanode service has "HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs".  So all logs will be found in "/var/log/gphd/hadoop-hdfs" for the datanode service.

[gpadmin@hdw1 ~]$ cat /etc/default/hadoop-hdfs-datanode  | egrep ^export
export HADOOP_PID_DIR=/var/run/gphd/hadoop-hdfs
export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs
export HADOOP_NAMENODE_USER=hdfs
export HADOOP_SECONDARYNAMENODE_USER=hdfs
export HADOOP_DATANODE_USER=hdfs
export HADOOP_IDENT_STRING=hdfs

Namenode, Journalnode, ZKFC, Datanode, and Secondary Namenode daemons

These daemons source their log4j settings from the following location /etc/gphd/hadoop/conf/log4j.properties

The HADOOP_ROOT_LOGGER environmental variable is used to control the default logger and is sourced in file "/usr/lib/gphd/hadoop/sbin/hadoop-daemon.sh" which sets the root logger to RollingFileAppender by default. This can be overridden here: /etc/gphd/hadoop/conf/hadoop-env.sh

export HADOOP_ROOT_LOGGER=INFO,RFA 

Audit Logging

Audit logging uses the DRFAS as per the "hadoop.security.logger" setting configured by /etc/gphd/hadoop/conf/hadoop-env.sh with HADOOP_NAMENODE_OPTS, HADOOP_DATANODE_OPTS, and HADOOP_SECONDARYNAMENODE_OPTS environmental variables.

Mapreduce History server

Mapreduce History service sources log4j settings from /etc/gphd/hadoop/conf/log4j.properties

HADOOP_MAPRED_ROOT_LOGGER environmental variable is used to control the default logger and is sourced in file "/usr/lib/gphd/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh" which sets the mapreduce history server logger to RollingFileAppender by default. This can be overridden here /etc/gphd/hadoop/conf/mapred-env.sh

export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

Resource Manager, Nodemanager

These daemons source their log4j settings from /etc/gphd/hadoop/conf/log4j.properties

YARN_ROOT_LOGGER environmental variable is used to control the default logger and is sourced in file "/usr/lib/gphd/hadoop-yarn/sbin/yarn-daemon.sh" which sets the default logger to RollingFileAppender. This can be overridden in /etc/gphd/hadoop/conf/yarn-env.sh

export YARN_ROOT_LOGGER=INFO,RFA

Zookeeper

Zookeeper sources log4j settings from /etc/gphd/zookeeper/conf/log4j.properties

The ZOO_LOG4J_PROP environmental variable is used to control the default logger and is sourced in file "/usr/bin/zookeeper-server" which sets the default logger to RollingFileAppender. This can be overridden by exporting this value in the following file location /etc/gphd/zookeeper/conf/java.env

export ZOO_LOG4J_PROP=INFO,ROLLINGFILE

Hbase Master, Hbase Regionserver

These daemons source their log4j settings from /etc/gphd/hbase/conf/log4j.properties

The HBASE_ROOT_LOGGER environmental variable is used to control the default logger and is sourced in file "/usr/lib/gphd/hbase/bin/hbase-daemon.sh" which sets the default logger to RollingFileAppender.  this can be overridden in /etc/gphd/hbase/conf/hbase-env.sh

export HBASE_ROOT_LOGGER=INFO,RFA

HIVE

Hive sources log4j settings from /etc/gphd/hive/conf/hive-log4j.properties  In PHD all Hive daemon logs will source this file for hive.root.logger. 

hive.root.logger=WARN,DRFA
hive.log.dir=/tmp/${user.name}
hive.log.file=hive.log

The following file location "/etc/init.d/hive-server" is used to start the Hive server and will set the Hive server log to file name "hive-server.log" and use the default hive.root.logger defined in hive-log4j.properties file. This log file will get truncated each time the Hive server daemon restarts.

NAME="hive-server"
LOG_FILE="/var/log/gphd/hive/${NAME}.log"

The following file location "/etc/init.d/hive-metastore" is used to start the Hive metastore and will set the Hive server log to the file name "hive-metastore" and uses the default hive.root.logger defined in hive-log4j.properties. This log file gets truncated each time the Hive metastore daemon restarts.

NAME="hive-metastore"
LOG_FILE="/var/log/gphd/hive/${NAME}.log"

Both the hive-server and hive-metastore daemon will log their data to "hive.log" as defined in hive-log4j.properties. The consolidated hive.log will get rotated as per hive.root.logger which is set to DRFA and defined in hive-log4j.properties.

Hive Query History Log

The history log file location is governed by "hive.querylog.location" from the hive-stie.xml. By default, this param is set to "/<hdfs-site.xml hadoop.tmp.dir>/${user.name}/"

 

Comments

Powered by Zendesk