Pivotal Knowledge Base

Follow

Unable to increase hive child process max heap when attempting hash join

Environment

  • PHD 3.0
  • HIVE 0.14.0

Symptom

User attempts to execute a hive query with a large hash join and fails with the following error

rg.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2015-08-24 09:22:56        Processing rows:       1300000 Hashtable size: 1299999 Memory usage:  1844338856        percentage:    0.966
        at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
        at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:251)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:404)
        at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:375)
        at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:341)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:744)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

A google search for this error will tell you to increase the hive.mapred.local.mem but container logs show the max heapsize is still only 2GB for the local JVM process that gets launched by map task despite the value set by hive.mapred.local.mem

2015-08-24 07:01:50     Starting to launch local task to process map join;      maximum memory = 1908932608

Cause

When executing a hash join the hive map task will launch a new JVM using "hadoop jar" command to spin up the "ExecDriver" main class

13868 [main] INFO  org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask  - Executing: /usr/bin/hadoop jar /u/applic/data/hdfs1/yarn/nm-local-dir/filecache/119/hive-exec-0.14.0.3.0.0.0-249.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10005/plan.xml   -jobconffile file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10006/jobconf.xml
2015-08-25 21:31:54,416 INFO  [main] mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(286)) - Executing: /usr/bin/hadoop jar /u/applic/data/hdfs1/yarn/nm-local-dir/filecache/119/hive-exec-0.14.0.3.0.0.0-249.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10005/plan.xml   -jobconffile file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10006/jobconf.xml

Even though hive will set environmental variable "HADOOP_HEAPSIZE" to the value defined in hive.mapred.local.mem before launching the JVM task the "/usr/bin/hadoop" command will override the current HADOOP_HEAPSIZE settings

Source from ./ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java that shows hive does set HADOOP_HEAPSIZE when hive.mapred.local.mem is set

      int hadoopMem = conf.getIntVar(HiveConf.ConfVars.HIVEHADOOPMAXMEM);
     if (hadoopMem == 0) {
       // remove env var that would default child jvm to use parent's memory
       // as default. child jvm would use default memory for a hadoop client
       variables.remove(HADOOP_MEM_KEY);
     } else {
       // user specified the memory for local mode hadoop run
       console.printInfo(" set heap size\t" + hadoopMem + "MB");
       variables.put(HADOOP_MEM_KEY, String.valueOf(hadoopMem));
     }
HiveConf.java:482:    HIVEHADOOPMAXMEM("hive.mapred.local.mem", 0),

"/usr/bin/hadoop" command sources /etc/hadoop/conf/hadoop-env.sh which wipes out any existing HADOOP_HEAPSIZE settings already in the environment. This results in param hive.mapred.local.mem having no effect

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE="2048"

Workaround

Increase the HADOOP_HEAPSIZE in /etc/hadoop/conf/hadoop-env.sh via ambari param "Hadoop Maximum Heap Size" and restart the nodemanger services

 

 

Comments

Powered by Zendesk