How long does a datanode have to be offline before the data starts replicating?


Product Version
Pivotal HD 3.x
HDFS  2.6.0


The article explains how long a datanode has to be offline before the data blocks are re-replicated to other data nodes within the cluster. This can be useful to know if there is a need to avoid replication because of cluster load, capacity or locality (for example in the case of Pivotal HDB 1.x).


By default, the datanode has to be unavailable (not sending a heartbeat to the namenode) for 10.5 minutes before the blocks on the unavailable datanode are replicated. The time to mark a datanode as dead is:

(dfs.namenode.heartbeat.recheck-interval * 2) + (10 * 1000 * heartbeatInterval)

If  the default values are applied to the above formula:

(300000 * 2) + (10 * 1000 * 3) = 630000 Milliseconds = 10.5 Minutes.

If there is a need to take up the time before replication starts this can be done by changing dfs.namenode.heartbeat.recheck-interval to a higher value by adding the configuration and value into AMBARI / HDFS / Configurations / Custom hdfs-site  


