Pivotal Knowledge Base

Follow

Configure GPHDFS with HDFS High Availability

Environment

  • GPDB
  • GPHDFS

Purpose

This article describes how to configure GPDB to communicate with a Hadoop environment with HDFS High Availability (HA) feature.

Resources

Refer to following resources for how to use GPHDFS in GPDB to access HDFS tables.

Understanding GPHDFS Configurtion settings

How to access data via GPDB external tables with GPHDFS

Using Hadoop Distributed File System (HDFS) Tables

Required parameters

Make sure following parameters are present in corresponding configuration files on all hosts (master/segment) of the GPDB cluster.

* Suppose HDFS nameservices is phd30c

core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://phd30c</value>
</property>

hdfs-site.xml

<property>
<name>dfs.nameservices</name>
<value>phd30c</value>
</property> <property>
<name>dfs.ha.namenodes.phd30c</name>
<value>nn1,nn2</value>
</property> <property>
<name>dfs.namenode.rpc-address.phd30c.nn1</name>
<value>hdm1.hadoop.local:8020</value>
</property> <property>
<name>dfs.namenode.rpc-address.phd30c.nn2</name>
<value>hdm2.hadoop.local:8020</value>
</property> <property>
<name>dfs.client.failover.proxy.provider.phd30c</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Define external table

Use nameservices when define GPHDFS external table, as shown below.

gpadmin=# create external table mytest(id int, name text) location ('gphdfs://phd30c/tmp/test.1') format 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE gpadmin=# \d+ mytest
External table "public.mytest"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+----------+-------------
id | integer | | plain |
name | text | | extended |
Type: readable
Encoding: UTF8
Format type: text
Format options: delimiter ',' null '\N' escape '\'
External location: gphdfs://phd30c/tmp/test.1 gpadmin=# select * from mytest;
id | name
----+-------
35 | scott
40 | john
(2 rows) [gpadmin@admin2 ~]$ hdfs dfs -cat hdfs://phd30c/tmp/test.1
35,scott
40,john
 

Comments

Powered by Zendesk