Pivotal Knowledge Base

Follow

HDFS - How to check the Default Replication for a file

Environment

Product Version
Pivotal HD All Versions

Purpose

The below article provides you a quick tip to find the replication factor for a file at the time of creation or after modification.

Procedure

The section below describes how to check the default replication for a file.

Understanding replication factor using "LS"

An "LS" on a file provides the replication factor, the second column of the output shows the default replication factor of the file as shown in the example below.

[root@kcadmin]# hdfs dfs -ls
Found 3 items
drwx------   - root hadoop          0 2014-01-29 06:14 .staging
-rw-r--r--   3 root hadoop       1943 2014-01-24 01:01 passwd
drwxr-xr-x   - root hadoop          0 2014-04-22 12:45 test

Where:

  • In the "passwd", file the replication factor is 3.
  • The '-' symbol represents a directory.

Change/modify the replication factor for file "passwd" using the following command:

[root@kcadmin]# hdfs dfs -setrep 2 passwd 
Replication 2 set: passwd

The output now looks similar to the following:

[root@kcadmin]# hdfs dfs -ls
Found 1 items
-rw-r--r--   2 root hadoop       1943 2014-01-24 01:01 passwd

Changing the replication factor for a directory only affects the existing files and the new files in the directory will be created with the default replication factor (dfs.replication from hdfs-site.xml) of the cluster as shown in he example below.

[root@kcadmin]# hdfs dfs -ls test/
Found 1 items
-rw-r--r--   4 root hadoop        316 2014-04-29 01:57 test/host1
[root@kcadmin]# hdfs dfs -setrep -R 2 test
Replication 2 set: test/host1
[root@kcadmin]# hdfs dfs -ls test/
Found 1 items
-rw-r--r--   2 root hadoop        316 2014-04-29 01:57 test/host1
[root@kcadmin]# hdfs dfs -copyFromLocal /etc/passwd test/
[root@kcadmin]# hdfs dfs -ls test/
Found 2 items
-rw-r--r--   2 root hadoop        316 2014-04-29 01:57 test/host1
-rw-r--r--   4 root hadoop       1943 2014-04-29 02:11 test/passwd

Comments

  • Avatar
    narendra

    I was searching for total list of files which has single replication factor in a cluster, this article show up in the list.

    Nice article, but its missing information about the subject of article. There are 2 ways of finding file replication factor.

    1. From Namenode UI browse file, file properties shows the replication for the file.

    2. use "fsck" command on file which will give you block replication factor and may other useful information about the file.
      $ hdfs fsck /tmp/file_replication.txt
      FSCK started by apps (auth:SIMPLE) from xx.xx.xx.xx for path /tmp/file_replication.txt at Mon Sep 14 21:52:17 PDT 2015
      .Status: HEALTHY
      Total size: 37142673 B
      Total dirs: 0
      Total files: 1
      Total blocks (validated): 1 (avg. block size 37142673 B)
      Minimally replicated blocks: 1 (100.0 %)
      Over-replicated blocks: 0 (0.0 %)
      Under-replicated blocks: 0 (0.0 %)
      Mis-replicated blocks: 0 (0.0 %)
      Default replication factor: 3
      Average block replication: 3.0 ---> Replication factor of your file.
      Corrupt blocks: 0
      Missing replicas: 0 (0.0 %)
      Number of data-nodes: 343
      Number of racks: 18
      FSCK ended at Mon Sep 14 21:52:17 PDT 2015 in 0 milliseconds

    The filesystem under path '/tmp/file_replication.txt' is HEALTHY

    Thanks,
    NArendra Jonna.

Powered by Zendesk