Pivotal Knowledge Base

Follow

Pivotal HD: replication does not succeed to a DataNode which has a block with an old generation timestamp

Environment
Pivotal HD all releases.

Problem
If the number of DataNodes in your cluster is less than or equal to the replication factor for a file on HDFS, a corrupt/old replica of a block in that file is not automatically fixed and requires manual intervention.
Before this manual fix is implemented, the block is marked as under-replicated and you get messages similar to this in the NameNode logs (normally the entry is repeated every few seconds until the problem is resolved):

2014-04-30 10:57:24,858 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough replicas, still in need of 1 to reach 3

In the example below, we have 3 DataNodes and replication factor 3 (the default).
block blk_5582039430147844965 has a replica with a generation timestamp (6872) on host hdw1.dca, which is older than the current/latest replica (6880) on the other two hosts hdw2.dca and hdw3.dca. This happened because the DataNode service was down on hdw1.dca while the block was updated.

[gpadmin@hdm1 ~]$ gpssh -f ~/hostfile_seg "ls -l /data/*/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir*/blk_5582039430147844965*"
[hdw3.dca] -rw-r--r-- 1 hdfs hadoop 21253104 Apr 30 15:24 /data/1/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir63/blk_5582039430147844965
[hdw3.dca] -rw-r--r-- 1 hdfs hadoop   166047 Apr 30 15:24 /data/1/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir63/blk_5582039430147844965_6880.meta
[hdw2.dca] -rw-r--r-- 1 hdfs hadoop 21253104 Apr 30 15:24 /data/1/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir29/blk_5582039430147844965
[hdw2.dca] -rw-r--r-- 1 hdfs hadoop   166047 Apr 30 15:24 /data/1/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir29/blk_5582039430147844965_6880.meta
[hdw1.dca] -rw-r--r-- 1 hdfs hadoop 21253072 Apr 30 13:59 /data/3/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir41/blk_5582039430147844965
[hdw1.dca] -rw-r--r-- 1 hdfs hadoop   166047 Apr 30 13:59 /data/3/dfs/data/current/BP-2083006907-192.165.10.1-1392999006690/current/finalized/subdir41/blk_5582039430147844965_6872.meta

This event is found in the NN logs:

2014-04-30 10:41:20,941 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(newblock=BP-2083006907-192.165.10.1-1392999006690:blk_5582039430147844965_6877, file=/hawq_data/gpseg1/16385/16522/28311.1, newgenerationstamp=6880, newlength=21253104, newtargets=[192.165.10.3:50010, 192.165.10.4:50010]) successful
[...]
2014-04-30 10:57:21,111 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_5582039430147844965 added as corrupt on 192.165.10.2:50010 by hdw1.dca/192.165.10.2 because block is COMPLETE and reported genstamp 6872 does not match genstamp in block map 6880


Cause
This is a known issue https://issues.apache.org/jira/browse/HDFS-3493 which has no fix yet.

Solution
The corrupt block and its related .meta file need to be manually deleted or moved to another location. The block will be eventually replicated back to that DataNode with a good copy. To accelerate the recovery, the DataNode service can be restarted.
You will then see something like this in the NN logs:

2014-04-30 22:29:49,005 INFO BlockStateChange: BLOCK* ask 192.165.10.3:50010 to replicate blk_5582039430147844965_6880 to datanode(s) 192.165.10.2:50010
2014-04-30 22:29:49,788 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.165.10.2:50010 is added to blk_5582039430147844965_6880 size 21253104


Notes
Please confirm with "hdfs fsck" on the file which blocks are under-replicated and that you have at least one good copy for each of those blocks.
As a extra precautionary step, the good copies can be backed up to another location until the required replication factor is satisfied again (this would work only if you can make sure the blocks are not being updated while you copy them).

Comments

Powered by Zendesk