Pivotal Knowledge Base

Follow

Namenode failed while loading fsimage with GC overhead limit exceeded

Problem

During startup namenode failed to load fsimage into memory

2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000252211550 using no compression
2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 29486731
2014-05-14 17:54:40,401 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.util.zip.ZipCoder.getBytes(ZipCoder.java:89)

Cause

Every file, directory and block in HDFS is stored as an object in namenode and occupies around 150 bytes of memory. 150 is not a fixed number, but is generally used as a rule of thumb. In this particular instance there were around:

  • 29+ million files, out of which around 25+ million files were generated during namenode stress benchmarking test (nnbench).
  • Size of fsimage file was 2.7 GB which averages to around ~90-100kb memory occupied in namenode memory per file.
  • Heap Size was default 1GB.

Due to a lower value of Heap Size and higher amount of fsimage size to be loaded in memory, Namenode Garbage Collector process was spending too much time to reclaim memory causing GC overhead limit errors.

Fix

Step 1: Identify an approximate value for Xmx / Xms parameters using the below formula. Example:

[root@hdm1 current]# sudo -u hdfs hdfs oiv -p XML -printToScreen -i fsimage_0000000000252211550 -o /tmp/a | egrep "BLOCK_ID|INODE_ID" | wc -l | awk '{printf "Objects=%d : Suggested Xms=%0dm Xmx=%0dm\n", $1, (($1 / 1000000 )*1024), (($1 / 1000000 )*1024)}'
Example output with 29 million records
Objects=29000000 : Suggested Xms=29696m Xmx=29696m

Step 2: Edit HADOOP_NAMENODE_OPTS parameter in /etc/gphd/hadoop/conf/hadoop-env.sh on namenode to have suggested Xmx and Xms values.

  • It enabled to load image file in ~ 150 seconds for the cluster on which we had this issue.

Step 3: Start namenode

service hadoop-hdfs-namenode start

Best Practices

  • Once Namenode is started, delete obsolete / unnecessary files. In this case, files created during namenode benchmarking test were deleted with skipTrash option. 
  • Perform a saveNamespace operation to save the namespace into storage directory(s) and reset the name-node journal (edits file) after deleting files. It will also reduce name-node startup time, because edits do not need to be digest. Or you can leave it for the checkpoint process to take care.
  • In situations where GC overlimit timeout issues occur we can disable this gc overlimit checks with java opt "-XX:-UseGCOverheadLimit" and consider temporarily increasing the xmx heapsize for extreme cases of very high transactions.

 

Comments

Powered by Zendesk