Pivotal Knowledge Base

Follow

IOException Job status not available when mapreduce job exits successfully

Environment

PHD 1.x

Symptom
Mapreduce job completes successfully but Java IOException is returned when job client queries for current jobstatus.

13/12/26 13:12:58 INFO mapreduce.Job: The url to track the job: http://hdm1.hadoop.local:8088/proxy/application_1388082686190_0004/
13/12/26 13:12:58 INFO danl.WordCount: job is still cranking away...
13/12/26 13:17:58 INFO danl.WordCount: job is still cranking away...
13/12/26 13:22:59 INFO ipc.Client: Retrying connect to server: hdw1.hadoop.local/192.168.3.201:55559. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
13/12/26 13:22:59 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
Exception in thread "main" java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:317)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:594)
at pivotal.eng.danl.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

MR1 use case example

job.submit();
while (! job.isComplete() ) {
   LOG.info("job is still cranking away using MR1 API...");
   Thread.sleep(interval);
}

MR2 use case example

RunningJob rj = job.submitJob(conf);
while (! rj.isComplete() ) {
   LOG.info("job is still cranking away using MR2 API...");
   Thread.sleep(interval);
}

In most cases this problem will be observed intermittently because the exception is only thrown if the job client requests the job status after the application master has exited successfully.  Throughout the duration of the mapreduce application job client is getting all of the updates directly from the application master.  Now that the application master is finished jobclient will be redirected to the mapreduce history server to collect the final job status. 

13/12/26 13:22:59 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

If the mapreduce history server is not up or if there was a configuration changes to the yarn-site.xml  without restarting the history server then MR History server will be unable to find the status of the given JobID for job client. 

Also in PHD 1.1.0 yarn-default.xml does not have the following mapred history params.  So if they are not explicitly defined in the yarn-site.xml then these values will not get added to the MR hist server.  The job history service will not be able to locate the job information without these params. 
mapreduce.jobhistory.intermediate-done-dir
mapreduce.jobhistory.done-dir

Fix

1.  Make sure the yarn-site.xml and mapred-site.xml are properly configured. Start the mapreduce history service if is not up.  

service hadoop-mapreduce-historyserver start

2. Login to the MR history server web interface and verify the following params values. 

MR history server IP or hostname>:19888/conf

Params to check:
mapreduce.jobhistory.intermediate-done-dir
mapreduce.jobhistory.done-dir

Comments

Powered by Zendesk