Pivotal Knowledge Base

Follow

After saving namespace namenode reports error "Read unexpect number of files" during startup

Environment

  • PHD 1.1.0.0
  • HDB 1.1.3.0

Symptom

2014-07-14 16:28:27,556 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000275268491 using no compression
2014-07-14 16:28:27,556 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 598687
2014-07-14 16:28:30,331 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2014-07-14 16:28:30,332 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2014-07-14 16:28:30,332 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2014-07-14 16:28:30,333 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Failed to load image from FSImageFile(file=/data/hadoop/nn/dfs/name/current/fsimage_0000000000275268491, cpktTxId=0000000000275268491)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:651)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:627)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:469)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:594)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
Caused by: java.io.IOException: Read unexpect number of files: 1
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodes(FSImageFormat.java:285)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:223)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:739)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:632)
        ... 9 more
2014-07-14 16:28:30,336 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

Cause

  During startup the namenode failed to load the most recent fsimage.  The error "Read unexpected number of files" suggest the total inode count ( value defined in the fsimage file ) is not consistent with the actual number of inodes found in the fsimage file.  

  This condition can be triggered because of how hawq performs concat operations.  Hawq uses a different directory then parent when creating the temp file for concat which can result in the corrupting the Namenode inode count for blocks under construction. 

How it can be avoided

Stop hawq, hbase, and hive services prior to performing saveNamespace operation.  

Fix

Upgrade to HAWQ 1.1.4 or later

Internal Jira Reference: HD-10956

Comments

Powered by Zendesk