Pivotal Knowledge Base

Follow

How to create a hadoop user on PHD cluster ?

While starting up with PHD, often administrators create users to allow them access HDFS and execute application. 

Below are some handy steps for user creation. You may perform these steps at the client machine/nodes.

1) Create an operating system group. You may skip this steps if you already have a group which you wish to use, its always recommended to define group for hadoop users.

[root@phd11-nn ~]# groupadd hadoopusers

2) Create an operating system user and associate it with a desired group.

[root@phd11-nn ~]# useradd -g hadoopusers app_user

3) Set a password for the user

[root@phd11-nn ~]# passwd app_user
Changing password for user app_user.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

4) Identify the value of hadoop.tmp.dir in core-site.xml

[root@phd11-nn ~]# egrep -C2 tmp.dir /etc/gphd/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/gphdtmp</value>
</property>

5) Ensure that permission of hadoop temp directory is 777, so that this directory could be used. In PHD, by default it's permission mode is 777. However, you may also create a temporary directory on client like below. 

[root@phd11-nn ~]# ls -ltrd /tmp/gphdtmp/
drwxrwxrwx 4 hdfs hadoop 4096 Jan 16 22:32 /tmp/gphdtmp/

Note: It's advised to keep temporary directory to be like /tmp/${user.name}_gphdtmp/, so that for every user there is a temporary scratch directory and there are no issues due to permissions in a multi user environment.

6) Create a directory structure in HDFS for the new user.

[root@phd11-nn ~]# sudo -u hdfs hdfs dfs -mkdir /user/app_user/
[root@phd11-nn ~]# sudo -u hdfs hdfs dfs -chown -R app_user:hadoopusers /user/app_user= [root@phd11-nn ~]# sudo -u hdfs hdfs dfs -ls /user/ Found 5 items
drwxr-xr-x - app_user hadoopusers 0 2014-01-16 22:47 /user/app_user

7) Make sure you refresh the user and group mappings so the namennode knows about the new user

[root@phd11-nn ~]# sudo -u hdfs hdfs dfsadmin -refreshUserToGroupsMappings

8) All set. Execute a job as app_user and direct the output under /user/app_user directory created for the user.

[app_user@phd11-nn ~]$ hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha-gphd-2.1.1.0.jar wordcount /tmp/test_input /user/app_user/test_output
14/01/16 22:52:43 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/01/16 22:52:43 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/01/16 22:52:44 INFO input.FileInputFormat: Total input paths to process : 1
14/01/16 22:52:44 INFO mapreduce.JobSubmitter: number of splits:1
In DefaultPathResolver.java. Path = hdfs://phda2/user/app_user/test_output
14/01/16 22:52:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1389933869290_0003
14/01/16 22:52:45 INFO client.YarnClientImpl: Submitted application application_1389933869290_0003 to ResourceManager at phd11-nn.saturn.local/10.110.127.195:8032
14/01/16 22:52:45 INFO mapreduce.Job: The url to track the job: http://phd11-nn.saturn.local:8088/proxy/application_1389933869290_0003/
14/01/16 22:52:45 INFO mapreduce.Job: Running job: job_1389933869290_0003
14/01/16 22:52:52 INFO mapreduce.Job: Job job_1389933869290_0003 running in uber mode : false
14/01/16 22:52:52 INFO mapreduce.Job: map 0% reduce 0%
14/01/16 22:53:00 INFO mapreduce.Job: map 100% reduce 0%
14/01/16 22:53:07 INFO mapreduce.Job: map 100% reduce 100%
14/01/16 22:53:07 INFO mapreduce.Job: Job job_1389933869290_0003 completed successfully

9) Verify the output, you must see an output similar to you.

[app_user@phd11-nn ~]$hdfs dfs -cat /user/app_user/test_output/*
{{MissingFormatWidthException}} 1 }". 1
İrken 1 ..

All set !!

Miscellaneous:

  • If you get an error message like below while executing jobs, it indicates insufficient privileges on temporary directory. Better to give 777.

"Error creating temp dir in hadoop.tmp.dir /tmp/gphdtmp due to Permission denied"

  • If security is enabled ( kerberos ) then it is important to note yarn applications ( mapreduce included ) will execute in the cluster as the user who is submitting the job instead of the default "yarn" user.  So you will need to add this user to each node in the cluster and ensure the /etc/security/limits.conf is properly defined to meet your workload requirements.  For example if user "joe" needs to run more then 1024 processes on a single node then please update limits.conf on all nodes accordingly.  Same applies to the "yarn" user if security is disabled.

Comments

Powered by Zendesk