Pivotal Knowledge Base

Follow

GPHDFS NoClassDefFoundError error when reading Avro and Parquet Tables

Environment

Product Version
GPDB 4.3.6.1
Hadoop HDP 2.x/ PHD 3.x

Symptom

Avro Error Message:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapred/FsInput
at com.emc.greenplum.gpdb.hadoop.formathandler.AvroFileReader.readAvroSchema(AvroFileReader.java:517)
Command: 'gphdfs://hadoopcluster:8020/project/avro_table/000000_0'
External table avro_table_ext

Parquet Error Message:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/io/api/RecordMaterializer
Command: 'gphdfs://hadoopcluster:8020/project/parquet_avro_table/000000_0'
External table parquet_table_ext

Cause 

By Default hadoop does not come with the jar files required for avro and parquet data formats. We have to download these dependencies manually and install them so GPDB can find the java classes and read the data from HDFS

Resolution

Follow the steps to resolve this issue:

NOTE: GPDB is compiled and tested with Parquet jar 1.7.0 and we recommend using 1.7.0 with GPDB as there maybe compatibility issues in 1.8.0

  1. Aquire the dependent jar files from Maven
  2. Create a directory on all segment servers called "custom_jars" and copy the parquet and avro jar files into this directory
    • $> mkdir /usr/phd/current/custom_jars
    • cp *.jar /usr/phd/current/custom_jars
  3. Then edit $GPHOME/lib/hadoop/hadoop_env.sh to include the following changes
    • A working example is attached here
    • ###### CHANGES FOR PHD/HDP support and AVRO/PARQUET SUPPORT ######
      if [ -d "/usr/hdp/current/custom_jars" ]
      then
              for f in /usr/hdp/current/custom_jars/*.jar
              do
                      CLASSPATH=${CLASSPATH}:$f
              done
      fi
      ###### CHANGES FOR PHD/HDP support and AVRO/PARQUET SUPPORT ######
  4. Distribute hadoop_env.sh to all nodes under /usr/local/greenplum-db/lib/hadoop
  5. Re-run the queries and ensure there are no further errors.  If you do find further classpath issues then please refer to the following KB article

Internal Comments

Please see MPP-25970 for internal reference

Comments

Powered by Zendesk