Pivotal Knowledge Base

Follow

One time HDFS Protocol Installation for GPHDFS Access to the HDP 2.x Cluster

Environment

  • Pivotal Greenplum 4.3.x
  • Operating System- Red Hat Enterprise Linux 6.x
  • Hadoop (HDP) 2.x

Purpose

This article describes standard settings for GPHDFS access to HDP 2.x Hadoop cluster. 

Cause 

Incorrect settings of gp_hadoop_home cause an error while executing a query accessing GPHDFS external table.

gpadmin=# select count(*) from tmp_parq;
ERROR:  external table gphdfs protocol command ended with error. /usr/local/greenplum-db/./lib//hadoop/hadoop_env.sh: line 125: /bin/java: No such file or directory  (seg0 slice1 admin.hadoop.local:50000 pid=2818)
DETAIL:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/TaskAttemptContext at java.lang.Class.getDeclaredMethods0(Native Method)at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)at java.lang.Class.getMethod0(Class.java:2813)at 
java.lang.Class.getMethod(Class.java:1663)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launch
Command: 'gphdfs://hdm1/tmp/sample/*.parquet' External table tmp_parq

Procedure

This procedure assumes you have already installed the HDP2.x package on all hosts using standard HDP installation. 1. From each segment node, add below two entries to /home/gpadmin/.bashrc  -

 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre 
export HADOOP_HOME=/usr/hdp/current/hadoop-client/client

2. From Hawq(HDB) master node:

$ gpconfig -c gp_hadoop_home -v "'/usr/hdp/current/hadoop-client/client'
$ gpconfig -c gp_hadoop_target_version -v "'hdp2'
$ gpstop -u

Additional Information 

For general information on the HDFS protocol installation, please review the GPDB documentation

For installation with a PHD3.x cluster the following settings can be used:

$ gpconfig -c gp_hadoop_home -v "'/usr/phd/current/hadoop-client/client'
export HADOOP_HOME=/usr/phd/current/hadoop-client/client

 

Comments

Powered by Zendesk