Pivotal Knowledge Base

Follow

Hive query with TEZ engine failed with exception "java.lang.OutOfMemoryError: Java heap space"

Environment

Product Version
Pivotal HD (PHD) 3.x
OS RHEL 6.x
Others  

Symptom

When running a Hive query with TEZ engine it failed with an exception "java.lang.OutOfMemoryError: Java heap space".

Error Message:

2016-08-05 08:36:24,835 ERROR [main]: SessionState (SessionState.java:printError(833)) - Status: Failed
2016-08-05 08:36:24,836 ERROR [main]: SessionState (SessionState.java:printError(833)) - Vertex failed, vertexName=Map 1, vertexId=vertex_147
0379789916_0174_1_01, diagnostics=[Task failed, taskId=task_1470379789916_0174_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: F
ailure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.<init>(BytesBytesMultiHashMap.java:162)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:91)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:82)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:108)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:190)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:216)
at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
 

Cause 

When executing very big queries more heap space might be required to process the query. So adjusting TEZ Java heap size may be necessary.

Resolution

In general, the TEZ Java heap size is configured as 80% of TEZ container size. So adjust TEZ container size as well when tuning TEZ Java heap size in the parameter setting hive.tez.java.opts.

For example;

SET hive.tez.container.size=10240mb
SET hive.tez.java.opts=-Xmx8192m

Note: the TEZ container size must be a multiple of the YARN container size. For example, if your YARN container size is set to 2GB, then set TEZ container size to 4GB. 

Comments

Powered by Zendesk