Pivotal Knowledge Base

Follow

Understanding format options for hdfs -stat command

Environment 

PHD 1.x

Purpose:

The hdfs "stat" command is useful when you need to write a quick script that will collect specific information about the files within HDFS.  

Use case: When you run hdfs -ls /filename it will always return the full path of the file, but you just need to pull out the basename. 

After reading this article you will know how to print only the file or directory name and certain relevant details about that file.  

Formatting options:

%b  Size of file in bytes
%F  Will return "file", "directory", or "symlink" depending on the type of inode
%g  Group name
%n  Filename
%o  HDFS Block size in bytes ( 128MB by default )
%r  Replication factor
%u   Username of owner
%y  Formatted mtime of inode
%Y  UNIX Epoch mtime of inode

Example: Use stat to return only basename confirming file or direcotry exists in HDFS

[root@hdm1 ~]# hdfs dfs -stat "%n" /tmp/messages
messages

Example: Compare all stat attributes with "ls"

[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /tmp/messages
143 regular file hadoop messages 134217728 3 root 2014-02-07 21:17:22 1391807842674 

Compared with "-ls"

Found 1 items
-rw-r--r--   3 root hadoop        143 2014-02-07 13:17 /tmp/messages

Example: Use stat with a directory

[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /tmp/gphdtmp
0 directory hadoop gphdtmp 0 0 hdfs 2013-12-26 07:08:06 1388041686026

Example: Performing a stat on all files and directories under /tmp

[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" "/tmp/*"
0 directory hadoop gphdtmp 0 0 hdfs 2013-12-26 07:08:06 1388041686026
143 regular file hadoop messages 134217728 3 root 2014-02-07 21:17:22 1391807842674

Comments

Powered by Zendesk