Pivotal Knowledge Base

Follow

Nodemanager fails to start in a secured cluster

Environment:

PHD 1.x

 

Symptom:

Nodemanager logs may indicate a failure like below:

2014-02-26 15:31:55,178 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: setsid exited with exit code 0
2014-02-26 15:31:55,182 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container is : 24
2014-02-26 15:31:55,183 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: configuration tokenization failed
2014-02-26 15:31:55,183 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to initialize container executor
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:144)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:321)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:135)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:142)
        ... 2 more
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: Can't get configured value for yarn.nodemanager.linux-container-executor.group.

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)
        at org.apache.hadoop.util.Shell.run(Shell.java:129)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:130)
        ... 3 more

 

And also 


2014/02/28 02:37:22 INFO mapreduce.Job: Job job_1393582635312_0006 failed with state FAILED due to: Application application_1393582635312_0006 failed 1 times due to AM Container for appattempt_1393582635312_0006_000001 exited with exitCode: -1000 due to: java.io.IOException: App initialization failed (139) with output:
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:191)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:860)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)
 at org.apache.hadoop.util.Shell.run(Shell.java:129)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:183)
 ... 1 more

.Failing this attempt.. Failing the application.

Testing the container-executor.cfg file may return the following: 

[root@bocdhdw1 ~]# cd /usr/lib/gphd/hadoop-yarn/bin
[root@bocdhdw1 bin]# ./container-executor --checksetup
configuration tokenization failed
Can't get configured value for yarn.nodemanager.linux-container-executor.group. 

Or there may be no result.

 

Background:

After configuring a secure cluster and the nodemanager fails to start, check the symptoms above in the nodes /var/log/gphd/hadoop-yarn/yarn-yarn-nodemanager-*.log


This error can occur if the container-executor.cfg has no banned.users entry
[root@bocdhdw1 hadoop-yarn]# cd /etc/gphd/hadoop/conf [root@bocdhdw1 conf]# cat container-executor.cfg #configured value of yarn.nodemanager.linux-container-executor.group yarn.nodemanager.linux-container-executor.group=yarn #comma separated list of users who can not run applications #Prevent other super-users min.user.id=400

or an empty "banned.user=" entry.

[root@bocdhdw1 hadoop-yarn]# cd /etc/gphd/hadoop/conf
[root@bocdhdw1 conf]# cat container-executor.cfg
#configured value of yarn.nodemanager.linux-container-executor.group
yarn.nodemanager.linux-container-executor.group=yarn
#comma separated list of users who can not run applications
banned.users=
#Prevent other super-users
min.user.id=400 

 

Workaround:

Review your security policy and add in the list of accounts that may not run jobs in yarn. 

Comments

Powered by Zendesk