Pivotal Knowledge Base

Follow

HDFS Blocks Keep Getting Corrupted

Environment

 Product  Version
 Pivotal HD  All
 OS  All Supported OS

Symptom

You are receiving error messages about a corrupted block. If the file is deleted, and the partitions rebuilt, the error goes away. But if Spring XD streams are running on it (which insert data into HAWQ), another single corrupt block shows up.

Cause

The reason for the error was found to be that the Secondary NameNode can't do it's periodic auto-checkpoint because of a namespaceID mismatch:

java.io.IOException: Inconsistent checkpoint fields.
LV = -63 namespaceID = 713175558 cTime = 0 ; clusterId = CID-f2caf2b4-b3da-4a34-a62f-fea8badc724e ; blockpoolId = BP-716932340-192.168.1.35-1470839146699.
Expecting respectively: -63; 1785058013; 0; CID-4f13424a-2eb8-43d9-9ae1-9df54670f489; BP-1667795605-192.168.1.35-1465693123284.
        at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
        at java.lang.Thread.run(Thread.java:745)

Resolution

Procedure to fix the issue:

  • Via Ambari UI, select HDFS service -> select Configs

  • Identify the directory value for parameter: dfs.namenode.checkpoint.dir

  • Note the value down

  • Stop all services except HDFS

  • hdfs dfsadmin -safemode enter

  • hdfs dfsadmin -safemode get

  • hdfs dfsadmin -saveNamespace

  • Shutdown remaining HDFS service(s)

  • log in to Secondary NameNode host as root

  • cd to the value of ${dfs.namenode.checkpoint.dir}

  • mv current current.bad

  • Start up HDFS service(s)

  • Wait for HDFS services to come online

  • Start the remaining Hadoop Services

Comments

Powered by Zendesk