Pivotal Knowledge Base

Follow

YARN Application Timeline Server in Pivotal HD is slow or down

Environment

Product Version
Pivotal HD 3.x

Symptom

The YARN Application Timeline Server may show some of the following symptoms:

  • Slow GUI interface 
  • Unresponsive GUI or no page available 
  • High CPU spikes on the server hosting the Application Timeline Server when trying to access the GUI 
  • Out of Memory (OOM) errors in the Application Timeline Server logs in /var/log/hadoop-yarn/yarn/yarn-yarn-timelineserver-<HOSTNAME>.log

Error Message

This message may be seen in /var/log/hadoop-yarn/yarn/yarn-yarn-timelineserver-<HOSTNAME>.log:

2016-05-09 15:57:57,056 ERROR mortbay.log (Slf4jLog.java:warn(87)) - Error for /applicationhistory 
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:02:29,968 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:discardOldEntities(1451)) - Discarded 0 entities for timestamp 1460106
149965 and earlier in 0.003 seconds
2016-05-09 16:07:02,558 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:07:52,511 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:11:55,795 ERROR mortbay.log (Slf4jLog.java:warn(87)) - Error for /applicationhistory
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:09:39,106 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:11:55,797 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:discardOldEntities(1451)) - Discarded 0 entities for timestamp 1460106
450582 and earlier in 265.214 seconds
2016-05-09 16:16:56,770 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:discardOldEntities(1451)) - Discarded 0 entities for timestamp 1460107
016766 and earlier in 0.004 seconds
2016-05-09 16:22:07,556 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:discardOldEntities(1451)) - Discarded 0 entities for timestamp 1460107
319334 and earlier in 8.222 seconds
2016-05-09 16:29:33,313 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
2016-05-09 16:34:34,911 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:32:50,408 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:30:04,370 ERROR mortbay.log (Slf4jLog.java:warn(87)) - handle failed
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:41:09,315 ERROR mortbay.log (Slf4jLog.java:warn(87)) - Error for /applicationhistory
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:41:09,315 ERROR mortbay.log (Slf4jLog.java:warn(87)) - Error for /applicationhistory
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:38:53,938 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[488235337@qtp-22203
3400-5403,5,main] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:38:46,312 ERROR mortbay.log (Slf4jLog.java:warn(87)) - Error for /ws/v1/timeline/
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:38:14,138 ERROR mortbay.log (Slf4jLog.java:warn(87)) - EXCEPTION
java.lang.OutOfMemoryError: Java heap space
2016-05-09 16:41:11,529 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:discardOldEntities(1451)) - Discarded 0 entities for timestamp 1460107631408 and earlier in 840.121 seconds
2016-05-09 16:41:11,554 INFO util.ExitUtil (ExitUtil.java:halt(147)) - Halt with status -1 Message: HaltException

Cause

The Application Timeline Server heap has run out of memory; this may be because there are a large number of jobs to keep track of.

Resolution

1. While the Application Timeline server is running, check the amount of memory being used by the Application Timeline server:

$JAVA_HOME/bin/jmap -heap <timelineserver_process>

2. Based on the above, it may be necessary to take up the amount of heap memory available for the Application Timeline server. Generally, 4Gb is reasonable to start with for a production cluster. The amount of heap can be modified by changing the configuration AppTimelineServer Java heap size via Ambari and restarting the Application Timeline Server.

3. If the App Timeline Server is using excessive amounts of memory, it may be necessary to reduce the timeline log retention setting, yarn.timeline-service.ttl-ms. By default, this is set to 2678400000 (31 days). Reducing it to 604800000 (7 days) may be necessary on a very busy cluster. Once the setting is changed, restart all necessary services as indicated by Ambari.

 

Comments

Powered by Zendesk