Pivotal Knowledge Base

Follow

How to change the replication of all blocks in HDFS

Environment

Product Version
Pivotal Hadoop (PHD) 3 .x
OS RHEL 6.x
Others  

Purpose

This article explains how to change the replication of all the blocks in HDFS. 

Procedure

If it is required to reduce the replication factor of existing files, then the following command can be used to adjust the replication factor of all files in the file system.

hadoop dfs -setrep -w <REPLICATION_FACTOR> -R /

Example:

Shown below is a HDFS with a replication factor of 3:

Status: HEALTHY
 Total size:    839729774 B
 Total dirs:    139
 Total files:   600
 Total symlinks:                0
 Total blocks (validated):      596 (avg. block size 1408942 B)
 Minimally replicated blocks:   596 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1

The query is run with replication factor 2:

hadoop dfs -setrep -w 2 -R /

This will change the complete replication from 3 as indicated by "Default replication factor" and to 2 as indicated by "Average block replication":

Status: HEALTHY
 Total size:    839729774 B
 Total dirs:    139
 Total files:   600
 Total symlinks:                0
 Total blocks (validated):      596 (avg. block size 1408942 B)
 Minimally replicated blocks:   596 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1

For more information on "Default replication factor" and "Average block replication," refer to this article.

For changing the replication factor across the cluster (permanently), you can follow the following steps:

  • Connect to the Ambari web URL
  • Click on the HDFS tab on the left
  • Click on the config tab
  • Under "General," change the value of "Block Replication" 
  • Now, restart the HDFS services

 

Comments

Powered by Zendesk