Pivotal Knowledge Base

Follow

Massive number of “cannot find name for group ID” errors found in the NameNode log. MapReduce cluser performance was impacted

Environment

PHD 1.1.1

Problem

A customer had multiple users that kept submitting MapReduce jobs to the clusters. The customer reported that the cluster performance became slow. All jobs completed fine.

Cause

Massive number of errors like following was found in NameNode log:

org.apache.hadoop.security.UserGroupInformation: No groups available for user test_user>
 >
2014-06-05 15:07:38,527 WARN org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user vsetbct1
org.apache.hadoop.util.Shell$ExitCodeException: id: cannot find name for group ID 100

       at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)

       at org.apache.hadoop.util.Shell.run(Shell.java:129)
 
       at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)

       at org.apache.hadoop.util.Shell.execCommand(Shell.java:411)
 
       at org.apache.hadoop.util.Shell.execCommand(Shell.java:394)
 
       at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
 
       at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
 
       at org.apache.hadoop.security.Groups.getGroups(Groups.java:89)
 
       at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1287)
 
       at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.<init>(FSPermissionChecker.java:51)
 
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4696)
 
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:4663)
 
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:3512)
 
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:3491)
 
       at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:671)
 
       at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
 
       at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45867).




This problem happens because the user who summited the jobs has a group ID which cannot be recognized by the hadoop cluster. Since Maprecue keeps checking and complaining about this problem, the cluster performance slowed down.

Fix:

Add a group name and user ID for that user in /etc/group on the host where the job was submitted.

Comments

Powered by Zendesk