Pivotal Knowledge Base

Follow

How to Find and Review Logs for Yarn MapReduce Jobs

Environment

Product Version
PHD 1.1.1
Centos  6.4
Host hdm1 Hadoop master node running all master services
Host hdw1, hdw2, hdw3 Hadoop worker nodes running DataNode and NodeManager services

Purpose

This article will cover the basic workflow of a MapReduce job with an example on how to troubleshoot the failing MapReduce jobs. We will explain in detail about how a NodeManager runs and maintains all of the container logs. 

Job Client

Job Client is used by the user to facilitate execution of the MapReduce job. When a user writes a MapReduce job they will typically invoke job client in their main class to configure and launch the job. In this example, we will be using SleepJob to cause mapper tasks to sleep for an extended period of time so we can see how the NodeManager is executing the jobs. 

    Job job = createJob(numMapper, numReducer, mapSleepTime,
    mapSleepCount, reduceSleepTime, reduceSleepCount);
    return job.waitForCompletion(true) ? 0 : 1; 

The function call job.WaitForCompletion will first create /user/gpadmin/.staging directory if it does not exist and create job.xml job.<timestamp>.conf.xml containing all the Hadoop params used to execute the job.

It also uploads "hadoop-mapreduce-client-jobclient.jar" jar file used in the Hadoop jar command into this directory renaming it to job.jar. "job.jar" will then be used by all the containers to execute the MapReduce job. 

Note: .staging directory will be created under the path /user/${username}. In this article, gpadmin is the user.

[gpadmin@hdw1 ~]$ hdfs dfs -ls /user/gpadmin/.staging/job_1389385968629_0025
Found 7 items
-rw-r--r--   3 gpadmin hadoop          7 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/appTokens
-rw-r--r--  10 gpadmin hadoop    1383034 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job.jar
-rw-r--r--  10 gpadmin hadoop        151 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job.split
-rw-r--r--   3 gpadmin hadoop         19 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job.splitmetainfo
-rw-r--r--   3 gpadmin hadoop      64874 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job.xml
-rw-r--r--   3 gpadmin hadoop          0 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1.jhist
-rw-r--r--   3 gpadmin hadoop      75335 2014-02-01 15:28 /user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1_conf.xml 

After .staging is created, job client will submit the job to the resource manager service (application manager port 8032). Then job client will continue to monitor the execution of the job and report back to the console with the progress of the map and reduce containers. That is why you see the "map 5% reduce 0%" while the job is running. Once the job completes, job client will return some statistics about the job that it collected during execution. Remember that job client gets map and reduce container statuses from the Application Master directly. We will talk a bit more about that later but for now here is an example of running the sleep job, so it hangs for a really long time while we observe the map containers execute. 

[gpadmin@hdm1 ~]$ hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-client-jobclient.jar sleep -m 3 -r 1 -mt 6000000
14/02/01 15:27:59 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/01 15:27:59 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/01 15:28:00 INFO mapreduce.JobSubmitter: number of splits:3
14/02/01 15:28:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1389385968629_0025
14/02/01 15:28:01 INFO client.YarnClientImpl: Submitted application application_1389385968629_0025 to ResourceManager at hdm1.hadoop.local/192.168.2.101:8032
14/02/01 15:28:01 INFO mapreduce.Job: The url to track the job: http://hdm1.hadoop.local:8088/proxy/application_1389385968629_0025/
14/02/01 15:28:01 INFO mapreduce.Job: Running job: job_1389385968629_0025
14/02/01 15:28:12 INFO mapreduce.Job: Job job_1389385968629_0025 running in uber mode : false
14/02/01 15:28:12 INFO mapreduce.Job:  map 0% reduce 0%
14/02/01 15:29:06 INFO mapreduce.Job:  map 1% reduce 0%
14/02/01 15:30:37 INFO mapreduce.Job:  map 2% reduce 0%
14/02/01 15:32:08 INFO mapreduce.Job:  map 3% reduce 0%
14/02/01 15:33:38 INFO mapreduce.Job:  map 4% reduce 0%

Note: You can kill the MapReduce job using the following command:

[root@hdw3 yarn]# yarn application -kill application_1389385968629_0025

Output:

14/02/01 16:53:30 INFO client.YarnClientImpl: Killing application application_1389385968629_0025
14/02/01 16:53:30 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. 

Application Master

Once the application manager service has decided to the start running the job, it then chooses one of the NodeManagers to launch the MapReduce application master class, which is called org.apache.hadoop.mapreduce.v2.app.MRAppMaster.

The application master service will be launched on one of the NodeManager servers running in the environment. The NodeManager selected by the resource manager is largely dependent on the available resources within the cluster. The Node manager service will generate shell scripts in the local application cache, which are used to execute the application master container.  

Here we see application master server container directory located in the NodeManagers ${yarn.nodemanager.local-dirs} defined in yarn-site.xml

nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/jobSubmitDir
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/jobSubmitDir/job.split
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/jobSubmitDir/appTokens
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/jobSubmitDir/job.splitmetainfo
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/job.xml
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/.default_container_executor.sh.crc
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/launch_container.sh
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/tmp
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/.container_tokens.crc
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/container_tokens
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/job.jar
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/default_container_executor.sh
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/.launch_container.sh.crc

The container executer class running in the NodeManager service will then use launch_container.sh to execute the Application Master class. As per below you can see all logs for stdout and stderr are getting redirected to ${yarn.nodemanager.log-dirs} defined in yarn-site.xml

[gpadmin@hdw3 yarn]# tail -1 nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/launch_container.sh
exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/data/dn/yarn/userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/data/dn/yarn/userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stdout 2>/data/dn/yarn/userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stderr  "

Once launched the Application Master will issue resource allocation requests for the map and reduce containers in the queue to the ResourceManager service. When the resource manager determines that there are enough resources on the cluster to grant the allocation request, it will inform the application master, which NodeManager service is available to execute the container. The application master will send a request to the NodeManager to launch the container.

Map or Reduce Container

The container executer class in the NodeManager will do the same for a map or reduce container as it did with the Application Master class. All files and shell scripts will be added into the containers application class within the nm-local-dir

nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/job.xml
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/.default_container_executor.sh.crc
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/launch_container.sh
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/tmp
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/.job.xml.crc
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/.container_tokens.crc
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/container_tokens
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/job.jar
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/default_container_executor.sh
nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/.launch_container.sh.crc

Note: job.jar is only a soft link that points to the actual job.jar in the applications filecache directory. This is how yarn handles distributed cache for containers:

[root@hdw1 yarn]# ls -l nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/
total 96
-rw-r--r-- 1 yarn yarn   108 Feb  1 15:25 container_tokens
-rwx------ 1 yarn yarn   450 Feb  1 15:25 default_container_executor.sh
lrwxrwxrwx 1 yarn yarn   122 Feb  1 15:25 job.jar -> /data/dn/yarn/nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/filecache/4395983903529068766/job.jar
-rw-r----- 1 yarn yarn 76430 Feb  1 15:25 job.xml
-rwx------ 1 yarn yarn  2898 Feb  1 15:25 launch_container.sh
drwx--x--- 2 yarn yarn  4096 Feb  1 15:25 tmp 

Note: By setting this param, the above container launches scripts and user cache will remain on the system for a specified period of time; otherwise these files get deleted after application completes. 

<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name> <value>10000000</value>
</property>

Resulting log file locations

During run time you will see all the container logs in the ${yarn.nodemanager.log-dirs} 

[root@hdw3 yarn]# find userlogs/ -print
userlogs/
userlogs/application_1389385968629_0025
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/syslog
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000002
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000002/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000002/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000002/syslog 

Once the job has completed the NodeManager will keep the log for each container for ${yarn.nodemanager.log.retain-seconds} which is 10800 seconds by default ( 3 hours ) and delete them once they have expired.  But if ${yarn.log-aggregation-enable} is enabled then the NodeManager will immediately concatenate all of the containers logs into one file and upload them into HDFS in ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/<application ID> and delete them from the local userlogs directory. Log aggregation is enabled by default in PHD and it makes log collection convenient. 

Example when log aggregation is enabled. We know there were 4 containers executed in this MapReduce job because "-m" specified 3 mappers and the fourth container is the application master. Each NodeManager got at least one container so all of them uploaded a log file. 

[gpadmin@hdm1 ~]$ hdfs dfs -ls /yarn/apps/gpadmin/logs/application_1389385968629_0025/
Found 3 items
-rw-r-----   3 gpadmin hadoop       4496 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw1.hadoop.local_30825
-rw-r-----   3 gpadmin hadoop       5378 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw2.hadoop.local_36429
-rw-r-----   3 gpadmin hadoop    1877950 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw3.hadoop.local_42186

Procedure

Targeting failed mapped tasks 

Attached to this article there is a modified version of the apache SleepJob MapReduce job. This job can be modified so the mapper tasks will randomly fail 25% of the time, just so we can see how to locate the logs for the failed map container. If you try to run this yourself job might fail if the same map task fails to execute for 3 times.

The following code was introduced to the Sleepmapper class:

      // Introduce a 25% chance of failing the maptask
      LOG.info("I am checking to see if i should punch myself and fail");
      Random rand = new Random();
      int  n = rand.nextInt(4);
      if ( n == 1 ) {
    	  LOG.info("KAPOW!! owwwwwyy... i failed");
    	  System.exit(123);
      }
      LOG.info("Punching myself seems like a bad idea.. i will just go to sleep instead");
  1. We see that containers are starting to fail part way through the job. However they get retried and the MapReduce job moves on.

    14/02/01 17:05:42 INFO client.YarnClientImpl: Submitted application application_1389385968629_0026 to ResourceManager at hdm1.hadoop.local/192.168.2.101:8032
    14/02/01 17:05:42 INFO mapreduce.Job: The url to track the job: http://hdm1.hadoop.local:8088/proxy/application_1389385968629_0026/
    14/02/01 17:05:42 INFO mapreduce.Job: Running job: job_1389385968629_0026
    14/02/01 17:05:52 INFO mapreduce.Job: Job job_1389385968629_0026 running in uber mode : false
    14/02/01 17:05:52 INFO mapreduce.Job:  map 0% reduce 0%
    14/02/01 17:06:28 INFO mapreduce.Job:  map 10% reduce 0%
    14/02/01 17:06:29 INFO mapreduce.Job:  map 15% reduce 0%
    14/02/01 17:06:29 INFO mapreduce.Job: Task Id : attempt_1389385968629_0026_m_000007_0, Status : FAILED
    14/02/01 17:06:30 INFO mapreduce.Job:  map 20% reduce 0%
    14/02/01 17:06:31 INFO mapreduce.Job:  map 25% reduce 0%
    14/02/01 17:06:31 INFO mapreduce.Job: Task Id : attempt_1389385968629_0026_m_000008_0, Status : FAILED
  2. Let us take a closer look at the attempt 8 and find out why this application fails. First we need to download the container logs from HDFS. You can cut the logs directly from HDFS if they are too large to download.

    [gpadmin@hdm1 logs]$ hdfs dfs -copyToLocal /yarn/apps/gpadmin/logs/application_1389385968629_0026/* ./
    [gpadmin@hdm1 logs]$ ls
    hdw1.hadoop.local_30825  hdw2.hadoop.local_36429  hdw3.hadoop.local_42186
  3. Search for attempt id 8. Sometimes the container executer will put non-asci data into the files so grep -a will allow text search of a binary file. We first grep for "to attempt_1389385968629_0026_m_000008_0" to see what container was assigned this attempt ID. We find that hdw1 is running the application master service so it has all the history related to this attempt. It is important to know that the application master can run on the same node as the map container.

    [gpadmin@hdm1 logs]$ egrep -a "to attempt_1389385968629_0026_m_000008_0" *
    hdw1.hadoop.local_30825:2014-02-01 17:03:30,509 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1389385968629_0026_01_000010 to attempt_1389385968629_0026_m_000008_0
  4. Then check hdw1 application master log to find out where container 10 was launched. We learn this container was running on hdw2. So let's find the details in the hdw2 log file.

    [gpadmin@hdm1 logs]$ egrep -a "container_1389385968629_0026_01_000010 on NM" hdw1.hadoop.local_30825
    2014-02-01 17:03:30,864 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1389385968629_0026_m_000008_0] using containerId: [container_1389385968629_0026_01_000010 on NM: [hdw2.hadoop.local:36429] 
  5. hdw2 log file shows the root cause. This map task decided to punch himself (see the code in bold). It is always good to check the container logs for failed maps or reduce tasks because sometimes the java stack trace reported by job client to the user is from a subsequent error that is not related to the original cause of the fault.

    (container_1389385968629_0026_01_000010 stderr stdout Fsyslog 2014-02-01 17:03:48,809 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    2014-02-01 17:04:11,481 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: eng.pivotal.hadoop.SleepJobFail$EmptySplit@61ab196b
    2014-02-01 17:04:11,580 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 134217724(536870896)
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 512
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 429496736
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 536870912
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 134217724; length = 33554432
    2014-02-01 17:04:13,953 INFO [main] eng.pivotal.hadoop.SleepJobFail$SleepMapper: I am checking to see if i should punch myself and fail
    2014-02-01 17:04:13,953 INFO [main] eng.pivotal.hadoop.SleepJobFail$SleepMapper: KAPOW!! owwwwwyy... i failed
    

Additional Information

Refer to the attached file: SleepJobFail.tar.gz

Comments

  • Avatar
    Liang Fang

    Good writing

  • Avatar
    Scott Gai

    Yes, this is really helpful!!

  • Avatar
    Michael Forrest

    This was a great read, thanks for posting!

  • Avatar
    Hai S. Ha

    This is a great article. I just wonder if we can convert the logs on HDFS to normal text file (not the binary that we have to use "grep -a" to search within)?

  • Avatar
    Brahma Reddy Battula

    Neatly documented..Thanks,it's helpfull

  • Avatar
    Vijayakumar Ramdoss

    Excellent work

  • Avatar
    Mauricio Jost

    Excellent post! I know I can get the logs with "yarn logs" command too, but it only shows stderr and stdout logs, no syslogs. Is that normal?

  • Avatar
    Taylor Vesely

    @Hai S Ha: If you want to remove the characters that cause grep to regard the file as 'binary', you can always run the file through the strings utility to remove any data that is not valid text.

    strings <FILE TO CONVERT> > strings.out

    Edited by Taylor Vesely
  • Avatar
    Taylor Vesely

    @Mauricio Jost: When I look at my application logs using 'yarn logs', I see the syslog output unless the output was not aggregated to hdfs. Would you check the application logs within hdfs to confirm whether your syslog output is available for that application?

Powered by Zendesk