Pivotal Knowledge Base

Follow

GPHDFS returns NoClassDefFoundError for TaskAttemptContext

Environment

  • GPDB 4.3
  • PHD 3.0
  • HDB 2.x

Symptom

Reading data from hdfs using GPHDFS returns NoClassDefFoundError for TaskAttemptContext

gpadmin=# select * from foo;                                                                                                                                           ERROR:  external table gphdfs protocol command ended with error. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/TaskAttemptContext  (seg1 slice1 etl2.gphd.local:1026 pid=239593)
DETAIL:

	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
	at java.lang.Class.getMethod0(Class.java:2813)
	at java.lang.Class.getMethod(Class.java:1663)
	at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundExcept
Command: 'gphdfs://nest/danl/data'
External table foo, file gphdfs://nest/danl/data

Cause

User configured gp_hadoop_home to the soft link "/usr/phd/current"

gpadmin=# show gp_hadoop_target_version;
 gp_hadoop_target_version
--------------------------
 gphd-2.0
(1 row)

gpadmin=# show gp_hadoop_home;
   gp_hadoop_home
--------------------
 /usr/phd/current
(1 row)

If we search the jar files for class TaskAttemptContext.class we can see that it is in jar file hadoop-mapreduce-client-core-2.6.0.3.0.1.0-1.jar

find /usr/phd/3.0.1.0-1 -type f -name *.jar | while read line; do l=$line; echo $l; jar tvf $l | egrep TaskAttemptContext; done
./hadoop-mapreduce/hadoop-mapreduce-client-core-2.6.0.3.0.1.0-1.jar
  3358 Sat Jun 20 01:31:50 EDT 2015 org/apache/hadoop/mapred/TaskAttemptContextImpl.class
   862 Sat Jun 20 01:31:50 EDT 2015 org/apache/hadoop/mapred/TaskAttemptContext.class
  3298 Sat Jun 20 01:31:50 EDT 2015 org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
  1370 Sat Jun 20 01:31:50 EDT 2015 org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.class
  1142 Sat Jun 20 01:31:50 EDT 2015 org/apache/hadoop/mapreduce/TaskAttemptContext.class

This jar file can be sourced via the following paths

[gpadmin@etl1 current]$ ls /usr/phd/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core.jar
/usr/phd/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core.jar
[gpadmin@etl1 current]$ ls /usr/phd/3.0.1.0-1/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
/usr/phd/3.0.1.0-1/hadoop-mapreduce/hadoop-mapreduce-client-core.jar

GPDB uses /usr/local/greenplum-db/lib/hadoop/hadoop-env.sh to source the jar files required for GPHDFS. In this case GPDB will source path "$GP_HADOO_HOME/hadoop-mapreduce/" ( /usr/phd/current/hadoop-mapreduce ) and given the soft link "/usr/phd/current/hadoop-mapreduce" does not exist GPDB is not able to find the mapreduce client core jar file and returns the java.lang.NoClassDefFoundError error.

Fix

Change GUC gp_hadoop_home to use the "/usr/phd/3.0.1.0-1"

Set at session level

gpadmin=# set gp_hadoop_home = '/usr/phd/3.0.1.0-1';
SET

Set globaly

[gpadmin@etl1 ~]$ gpconfig -c gp_hadoop_home -v "'/usr/phd/3.0.1.0-1'"

Comments

Powered by Zendesk