Pivotal Knowledge Base

フォローする

MapReduceのジョブがPriviledgedActionExceptionによって失敗する

環境

製品 バージョン
Pivotal HD(PHD) 1.x

事象
MapReduceのジョブが以下のブロックエラーで失敗する。

13/11/12 16:28:15 INFO mapreduce.Job: Task Id : attempt_1384293372109_0014_m_000000_0, Status : FAILED
RemoteTrace:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2050079688-192.168.2.101-1383348089401:blk_8897164249780275876_2486 file=/user/gpadmin/.staging/job_1384293372109_0014/job.xml
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:734)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:448)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
	at java.io.DataInputStream.read(DataInputStream.java:83)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:325)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:248)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2034)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2003)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1979)
	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:190)
	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:280)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
 at LocalTrace:

 

Namenode のログ

2013-11-12 16:29:20,830 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /user/gpadmin/.staging/job_1384293372109_0014/job.xml
2013-11-12 16:29:20,830 INFO org.apache.hadoop.ipc.Server: IPC Server handler 91 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.2.102:52348: error: java.io.FileNotFoundException: File does not exist: /user/gpadmin/.staging/job_1384293372109_0014/job.xml
java.io.FileNotFoundException: File does not exist: /user/gpadmin/.staging/job_1384293372109_0014/job.xml
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:245)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43433)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

 

Datanode のログ

2013-11-12 16:29:20,836 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Can't continue with getBlockLocalPathInfo() authorization. The user gpadmin is not allowed to call getBlockLocalPathInfo
2013-11-12 16:29:20,836 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 50020, call org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol.getBlockLocalPathInfo from 192.168.2.102:45947: error: org.apache.hadoop.security.AccessControlException: Can't continue with getBlockLocalPathInfo() authorization. The user gpadmin is not allowed to call getBlockLocalPathInfo
org.apache.hadoop.security.AccessControlException: Can't continue with getBlockLocalPathInfo() authorization. The user gpadmin is not allowed to call getBlockLocalPathInfo
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkBlockLocalPathAccess(DataNode.java:1011)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1021)
        at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112)
        at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:5104)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)


原因

PHDでは、Short Circuit Readでの読み込みはデフォルトで有効になっており、下記のユーザはデフォルトの設定に含まれている。もし、お客様が dfs.block.local-path-access.user のメンバではないユーザでmapreduce/hbaseのジョブを実行すると、ジョブが失敗する場合がある。

<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
</property>
 
<property>
    <name>dfs.block.local-path-access.user</name>
    <value>gpadmin,hdfs,mapred,yarn,hbase,hive</value>
</property>


解決策

以下3つのいずれかの手順を実施する。

  • 対象ユーザをdfs.block.local-path-access.userに追加する
  • Short Circuit Readの読み込みを無効にする (クラスタ内でhawqが有効になっている場合は推奨しない)
  • ジョブごとにShort Circuit Readの読み込みを引数「-D dfs.client.read.shortcircuit=false」を指定して無効にする

コメント

Powered by Zendesk