Pivotal Knowledge Base

Follow

How to enable RHDFS on PHD client node

Environment

PHD 1.1.1

R 3.0.2

rhdfs 1.0.8

PREREQUISITES

  • Client node was deployed with ICM or by manual RPM install.  If binary install was used the some directory paths may need to be modified in the below procedure
  • In a kerberized HDFS environment the user running RHDFS client will need a Kerberos TGT already granted.  This can be verified with "klist" run from RHDFS client node. 
  • JAVA 1.6/1.7 installed

INSTALLATION

  1. Download R from http://cran.r-project.org/mirrors.html
  2. Download link: https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz?raw=true
  3. Install required headers for R compilation
    yum install libX11-devel libXt-devel readline-devel
  4. Extract R package and cd into R-3.0.2 dir
    tar -xzvf R-3.0.2.tar.gz
    cd R-3.0.2
  5. Build R
    ./configure
    make
    make install
  6. Configure JAVA for R
    [root@rhdfs R]# export JAVA_HOME=/usr/java/default
    [root@rhdfs R]# R CMD javareconf
  7. Access R prompt with command "R" and install needed packages
    [root@rhdfs R] R
    > install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2"))
    
  8. Install RHDFS package. Make sure to change the pathoftherhdfs tar archive
    [root@rhdfs R] R 
    > Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
    > install.packages("/root/R/rhdfs_1.0.8.tar.gz", repos = NULL, type="source")
    
  9. Quit R with "q()"  after installation is complete
  10. Thentestrunningcommand "hdfs -ls" using R.  First you must set the environment so R canfindthehadoop bin file andloadtherhdfs library.  Once that iscompleterunhdfs.init() to initializethehdfs client. 
    [root@rhdfs R] R
    >  Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
    > library(rhdfs)
    Loading required package: rJava
    HADOOP_CMD=/usr/bin/hadoop
    
    > hdfs.init()
    14/02/25 09:07:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    
  11. Now that your RHDFS environment is pre-initialized youcanrunhdfs.ls to test connectivity
    > hdfs.ls("/tmp")
      permission owner  group size          modtime         file
    1 drwxrwxrwx  hdfs hadoop    0 2013-11-06 14:37 /tmp/gphdtmp
    > hdfs.ls("/")
      permission    owner   group size          modtime       file
    1 drwxr-xr-x     hdfs  hadoop    0 2013-11-06 14:38      /apps
    2 drwxr-xr-x postgres gpadmin    0 2014-02-21 06:50 /hawq_data
    3 drwxr-xr-x     hdfs  hadoop    0 2013-11-06 14:40      /hive
    4 drwxr-xr-x   mapred  hadoop    0 2013-11-06 14:37    /mapred
    5 drwxrwxrwx     hdfs  hadoop    0 2013-11-06 14:37       /tmp
    6 drwxrwxrwx     hdfs  hadoop    0 2014-02-22 02:05      /user
    7 drwxr-xr-x     hdfs  hadoop    0 2013-11-06 14:38      /yarn
    

Comments

Powered by Zendesk