Pivotal Knowledge Base

Follow

Pivotal HDB initialization failed with : [FATAL]: create dfs filespace failed ERROR: could not create filespace directory hdfs://.... Input/output error

Environment

  • PHD 2.x
  • PHD 1.x

Problem

Pivotal HDB initialization failed, error message shown while initialization

[gpadmin@hawq-mdw utils]$ gpinitsystem  -c gpinitsystem_config -h hostfile
...
...
20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Create filespace dfs_system
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-Failed to create dfs filespace; review gpinitsystem output to
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- determine why this step failed and reinitialize cluster after resolving
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- issues.  Not all initialization tasks have completed so the cluster
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- should not be used.
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-gpinitsystem will now try to stop the cluster
20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:
20131010:16:44:58:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Starting gpstop with args: -a -i -d /data/master/gpseg-1
..
..
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Greenplum Version: 'postgres (HAWQ) 4.2.0 build 1'
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-There are 0 connections to the database
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='immediate'
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Master host=hawq-mdw
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=immediate
20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Master segment instance directory=/data/master/gpseg-1
20131010:16:45:01:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-No standby master host configured
20131010:16:45:01:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
...
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-----------------------------------------------------
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-   Segments stopped successfully      = 2
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-   Segments with errors during stop   = 0
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-----------------------------------------------------
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances
20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Database successfully shutdown with no errors reported
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Successfully shutdown the Greenplum instance
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-Failed to create dfs filespace; review gpinitsystem output to
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- determine why this step failed and reinitialize cluster after resolving
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- issues.  Not all initialization tasks have completed so the cluster
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- should not be used.
20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:
20131010:16:45:04:gpinitsystem:hawq-mdw:gpadmin-[FATAL]: create dfs filespace failed; Script Exiting!  

gpinitsystem log in /home/gpadmin/gpAdminLogs/ shows the below snippet

20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-DFS_PATH_LIST: 1:'/data/master/dfs/gpseg-1',2:'hawq-mdw:9000/hawq/gpseg0',3:'hawq-mdw:9000/hawq/gpseg1'
20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Create filespace dfs_system
WARNING:  function 1 returned error: -1
WARNING:  fail to connect hdfs at hawq-mdw:9000, errno = 5
WARNING:  function 1 returned error: -1
WARNING:  fail to connect hdfs at hawq-mdw:9000, errno = 5
WARNING:  function 1 returned error: -1
WARNING:  fail to connect hdfs at hawq-mdw:9000, errno = 5
WARNING:  function 1 returned error: -1
CONTEXT:  Dropping file-system object -- Filespace Directory: '16384'
WARNING:  fail to connect hdfs at hawq-mdw:9000, errno = 5
CONTEXT:  Dropping file-system object -- Filespace Directory: '16384'
WARNING:  could not remove filespace directory 16384: Input/output error
CONTEXT:  Dropping file-system object -- Filespace Directory: '16384'
ERROR:  could not create filespace directory hdfs://hawq-mdw:9000/hawq/gpseg0: Input/output error

Cause

During Initialization hdb was unable to create the directory structure in HDFS using URI address hdfs:/hawq-mdw:9000/.  In other words, inialization errored out while accessing hdfs filesystem using the given URI.

In this case the port number 9000 configured for param DFS_URL in the /etc/gphd/hawq/conf/gpinitsystem_config is not correct.  

[gpadmin@hawq-mdw hadoop]$ hadoop fs -ls hdfs://hawq-mdw:9000/
ls: Call From hawq-mdw.saturn.local/192.165.100.31 to hawq-mdw:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
[gpadmin@hawq-mdw hadoop]$

lskdjflsdgpinitsystem_config file DFS_URL param

[gpadmin@hawq-mdw utils]$ egrep DFS_URL gpinitsystem_config
DFS_URL=hawq-mdw:9000/hawq 

Solution:

Identify the correct host and port information from the clusters /etc/gphd/hadoop/conf/core-site.xml 

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hawq-mdw:8020</value>
</property>

 Verify the core-site.xml URI path is correct 

[gpadmin@hawq-mdw conf]$ hadoop fs -ls hdfs://hawq-mdw:8020/
Found 3 items
drwxr--r--   - hdfs   supergroup          0 2013-10-10 17:43 hdfs://hawq-mdw:8020/hawq
drwxr-xr-x   - mapred hadoop              0 2013-10-10 17:31 hdfs://hawq-mdw:8020/mapred
drwxr-xr-x   - hdfs   supergroup          0 2013-10-10 16:24 hdfs://hawq-mdw:8020/user

 Change the value in gpinitsystem_config to below and perform gpinitsystem again

DFS_URL=hawq-mdw:8020/hawq

 

NOTE:

  • Master/Segment directory must be deleted before running gpinitsystem
  • It is not mandatory to have port 8020 for DFS_URL.  Always use the port configured in the core-site.xml 

Comments

Powered by Zendesk