Pivotal Knowledge Base

Follow

ClassNotFound Exception when Loading PARQUET External Table with GPHDFS

Environment

 Product  Version
 Pivotal Greenplum   4.3.x
 Hadoop  2.6, 2.7
 GPHDFS  

Symptom

When trying to insert rows into an external writable table in PARQUET format, the following error may be seen:

padmin=# CREATE WRITABLE EXTERNAL TABLE test_hdfs_parquet (id int) 
LOCATION ('gphdfs://hawq20/gphdfs2/parquet_table')
FORMAT 'PARQUET'; CREATE EXTERNAL TABLE
gpadmin=# insert into test_hdfs_parquet values (1);
ERROR: external table gphdfs protocol command ended with error. 16/10/10 20:45:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (seg1 gpdb-sandbox.localdomain:40001 pid=5206) DETAIL: 16/10/10 20:45:31 INFO Configuration.deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 16/10/10 20:45:31 INFO Configuration.deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec Exception in thread "main" java.lang.NoC Command: 'gphdfs://hawq20/gphdfs2/parquet_table' gpadmin=#
 

Cause

This is because the PARQUET JAR files are not being loaded correctly or are not present on the master and segments of the Greenplum cluster. The PARQUET JAR files should have been installed as a part of the PARQUET configuration

Resolution 

1. Confirm the location of "gp_hadoop_home"  for the cluster:

[gpadmin@gpdb-sandbox ~]$ psql -c 'show gp_hadoop_home;'
    gp_hadoop_home
----------------------
 /usr/hdp/2.4.2.0-258
(1 row)

[gpadmin@gpdb-sandbox ~]$ 

2. Download the PARQUET files as per this documentation

parquet-hadoop-1.7.0.jar
parquet-common-1.7.0.jar
parquet-encoding-1.7.0.jar
parquet-column-1.7.0.jar
parquet-generator-1.7.0.jar
parquet-format-2.3.0-incubating.jar 

3. Place the files in $gp_hadoop_home/hadoop/lib/, so from the above example in /usr/hdp/2.4.2.0-258/hadoop/lib/:

[root@gpdb-sandbox lib]# ls -ltr /usr/hdp/2.4.2.0-258/hadoop/lib/parquet*
-rw-r--r-- 1 root root  21243 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-generator-1.7.0.jar
-rw-r--r-- 1 root root 387188 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-format-2.3.0-incubating.jar
-rw-r--r-- 1 root root 285447 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-encoding-1.7.0.jar
-rw-r--r-- 1 root root  21575 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-common-1.7.0.jar
-rw-r--r-- 1 root root 917052 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-column-1.7.0.jar
-rw-r--r-- 1 root root 209622 Oct 10 22:25 /usr/hdp/2.4.2.0-258/hadoop/lib/parquet-hadoop-1.7.0.jar
[root@gpdb-sandbox lib]#

4. Run the insert again to confirm that the issue is resolved.

 

Comments

Powered by Zendesk