Pivotal Knowledge Base

Follow

HDFS reports Configured Capacity: 0 (0 B) for datanode

Environment

Product Version
Pivotal HD (PHD) All versions

Symptom

When hdfs dfsadmin -report is running, an error appears indicating the configured capacity as 0 B for a datanode.

Error Message:

Name: 192.165.100.56:50010 (phd11-dn-2.saturn.local)
Hostname: phd11-dn-2.saturn.local
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Sun Jan 12 00:51:21 PST 2014

In such case, users may still see Datanode services to be running on the server. However, trying to load any data onto HDFS, reports an exception if dfs.replication.min threshold is not met. For example;

[gpadmin@phd11-nn conf.gphd-2.0.1]$ hdfs dfs -copyFromLocal /etc/passwd /user/

14/01/12 01:14:04 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/passwd._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2350)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:357)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43449)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

Cause

Before we go further, let's understand what does this message signify:

"could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation." 

Above message indicates there are no datanodes running, even if they are running; they are not connected to the namenode. Also, dfs.replication.min parameter is set as '1' and since the data was not replicated to any of the datanodes, this mandatory threshold could not be met resulting in a failed write operation.

The output indicated by -report command highlights the current state of the cluster as per the namenode, which also indicates that there is no capacity on the datanode.

Note: Value of dfs.replication.min is set to 1 by default.

Resolution

There are multiple situations where HDFS can get into this state.

- Only namenode is running and it's not in safemode
- Namenode and Datanodes are both running, but datanodes are not able to send their heartbeat & blockreport to the namenode. 
- Datanode is dead
Note: These are symptoms that either of the below is a problem:
- Configuration files are not setup properly, including proper permissions for the directories.
- There is connectivity issue between datanode and namenode

To troubleshoot it further, you may do the below:

- Verify the status of namenode and datanode services.
- Verify the logs for namenode and datanode services
- Verify whether core-site.xml has fs.defaultFS value specified correctly.
- Verify that dfs.namenode.http-address in hdfs-site.xml is specificied correctly.
- Verify hdfs-site.xml has dfs.namenode.http-address.<nameservice>.<namenodeid> for both namenodes specified correctly in case of PHD HA configuration.
- Verify if the permission on the directories is correct

Some configuration issues examples:

Case 1: The below values identifying the namenode address in PHD HA configuration were missing. After correcting values on all the nodes, the services are restarted successfully. 

<property>
<name>dfs.namenode.http-address.phdha.nn2</name>
<value>phd11-standby.saturn.local:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.phdha.nn1</name>
<value>phd11-nn.saturn.local:50070</value>
</property>

Case 2: Datanode logs report the error mentioned below. It indicates that the datanode data directories are either not present or does not have proper privileges.

2014-03-20 03:55:59,665 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-559675079-192.166.10.1-1395309820009 (storage id ) service to faihdm1/192.166.10.1:8020
java.io.IOException: All specified directories are not accessible or do not exist.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:183)

Comments

Powered by Zendesk