Pivotal Knowledge Base

Follow

PHD - MapReduce job fails with "cannot open shared object file"

Environment

Product Version
 Pivotal HD  2.x, 3.x
 MapReduce  
 YARN  

Symptom

A MapReduce job fails because shared libraries are not correctly installed on the nodes where the mapper or reducers run.

MapReduce jobs fail after a few seconds with a message indicating the mapper failed:

16/02/02 16:36:29 INFO impl.YarnClientImpl: Submitted application application_1453825164633_0017 to ResourceManager at hdm3.gphd.local/172.28.9.252:8032
16/02/02 16:36:29 INFO mapreduce.Job: The url to track the job: http://hdm3.gphd.local:8088/proxy/application_1453825164633_0017/
16/02/02 16:36:29 INFO mapreduce.Job: Running job: job_1453825164633_0017
16/02/02 16:36:34 INFO mapreduce.Job: Job job_1453825164633_0017 running in uber mode : false
16/02/02 16:36:34 INFO mapreduce.Job: map 0% reduce 0%
16/02/02 16:36:39 INFO mapreduce.Job: map 100% reduce 100%
16/02/02 16:36:39 INFO mapreduce.Job: Job job_1453825164633_0017 failed with state FAILED due to: Task failed task_1453825164633_0017_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0

 The job status indicates the job is failed and retired:

[hdfs@hdm1]$ hadoop job -status job_1453825164633_0019
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.

16/02/02 17:29:06 INFO client.RMProxy: Connecting to ResourceManager at hdm3.gphd.local/172.28.9.252:8032
16/02/02 17:29:06 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server

Job: job_1453825164633_0019
Job File: hdfs://dcans:8020/user/history/done/2016/02/02/000000/job_1453825164633_0019_conf.xml
Job Tracking URL : http://hdm3.gphd.local:19888/jobhistory/job/job_1453825164633_0019
Uber job : false
Number of maps: 30
Number of reduces: 20
map() completion: 0.0
reduce() completion: 0.0
Job state: FAILED
retired: false
reason for failure: task 1453825164633_0019_m_000018 failed 2 times For details check tasktracker at: hdw2.gphd.local:19874
Counters not available. Job is retired.
[hdfs@hdm1]$ 

The job logs indicate a shared library cannot be found (either review logs via the Job History server or review How to Find and Review Logs for Yarn MapReduce Jobs.

(^@&container_1454415084759_0006_01_000003<A8>^Z<9A>^@^Fstderr^@^C280/data6/hadoop/data/yarn/nm-local-dir/usercache/hdfs/appcache/application_1454415084759_0006/container_1454415084759_0006_01_000003/././Mapper_analog_info_m: error while loading shared libraries: libboost_regex.so.1.41.0: cannot open shared object file: No such file or directory

Cause

Shared libraries are not available on at least one of the nodes running the Mapper or the Reducer. 

Resolution

1. Determine which shared libraries the mapper and reducer need access to by using the "ldd" command: 

[hdm1:bin]$ ldd mapper_binary 
linux-vdso.so.1 => (0x00007fff92d5c000)
librt.so.1 => /lib64/librt.so.1 (0x00000034bda00000)
/lib64/ld-linux-x86-64.so.2 (0x00000034bc600000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00000034c7a00000)
libaudit.so.1 => /lib64/libaudit.so.1 (0x00000034c8600000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00000034c7e00000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00000034cc600000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00000034bea00000)
libidn.so.11 => /lib64/libidn.so.11 (0x00000034bee00000)
libfreebl3.so => /lib64/libfreebl3.so (0x00000034c8200000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00000034cd600000)
[hdm1:bin]$

2. On each node datanode confirm that each shared library is available and in the correct location as outlined in the above output.

3. On nodes where the shared libraries are missing ask the customer to install them.

4. Re-run the job. 

 

Comments

Powered by Zendesk