Pivotal Knowledge Base

Follow

Namenode busy replaying edit logs, HDFS cluster is unavailable, What next?

Note: Valid for PHD 1.x.

Refer APACHE BUG: HDFS-4249

Case Study:

Hadoop administrator may run into situations in which namenode daemon starts, but does not respond to HDFS queries. Experienced hadoop administrator may be well versed with such situations, however for newbie's it may involve some efforts to understand what's happening on the Pivotal HD cluster.

To check the status of namenode service, run the below command on namenode:
service hadoop-hdfs-namenode status

The above might indicate that namenode is running, and if you started all PHD services, you may find that datanode services are also running. Check the status of datanode dameon using:
hadoop-hdfs-datanode status

Even though both Namenode and Datanode services are up and running, you are not able to execute any hadoop filesystem commands. The Namenode logs should reveal what the current Namenode status is.

Before we go further, let's just understand what operations occur when namenode is started and correlate them to the messages in logs.

High Level Namenode startup operations:
1) Namenode initializes it's namespace information by reading the last checkpoint
2) Based on checkpoint information, namenode replays any edit logs to create fresh namespace information
3) Creates a new checkpoint
4) Opens hadoop communication ports to serve hadoop namenode web UI's and other daemons like secondary namenode, datanode etc.
5) Enter safe mode, Only namespace read operations can be performed until a sufficient amount of blocks have been reported by datanode
6) Exit safe mode, HDFS Read & Write operations can now be performed.

Let's get back to the reason to understand what namenode is busy doing and why none of the UI could be opened, and what is datanode trying to do?

Namenode:
Namenode logs may provide some insight into the current situation, if you see the below message snippet and especially the keywords like "replaying edit logs", you could easily correlate it to the step 2) during namenode startup operation above.

Now, we know what's happening, the next question which arises is, how long would namenode take to replay all the edit logs?. Amount of time taken by namenode would depend on the number of edit logs required to be replayed and performance of namenode server. So, in order to identify how much time could it take, try to identify when was the last successful checkpoint done based on the creation time of latest fsimage* file and how many edit files ("ls -l edit* | wc -l") are there since the checkpoint. Location for the above list files could be found using dfs.namenode.name.dir parameter in hdfs-site.xml. Use these metrics and monitor how is the progress as per the logs, to predict a time frame. We had a situation in which the checkpoint operation did not occur for last 60 days, and it took around 12hours for the process to complete. 

2014-01-17 23:42:52,846 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4a504ec1 expecting start txid #24278861
2014-01-17 23:42:52,848 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/hadoop/name/dfs/name/current/edits_0000000000024278861-0000000000024326774' to transaction ID 24278861
2014-01-17 23:42:53,848 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 46885/47914 transactions completed. (98%)
2014-01-17 23:42:53,868 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /data/hadoop/name/dfs/name/current/edits_0000000000024278861-0000000000024326774 of size 4633032 edits # 47914 loaded in 1 seconds
2014-01-17 23:42:53,868 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5e628704 expecting start txid #24326775
2014-01-17 23:42:53,868 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/hadoop/name/dfs/name/current/edits_0000000000024326775-0000000000024343357' to transaction ID 24278861
...
...
...
2014-01-17 23:42:53,848 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 46885/47914 transactions completed. (98%)
2014-01-17 23:42:53,868 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /data/hadoop/name/dfs/name/current/edits_0000000000024278861-0000000000024326774 of size 4633032 edits # 47914 loaded in 1 seconds

Datanode:
Now let's just see what will be happening at the datanode. Datanode logs may indicate messages like below, indicating that they are trying to reach the namenode and since the IPC communication port is not open, datanode is unable to register it with namenode. You may correlate this to the step 4) during namenode startup operation and could concentrate back on namenode only.

2014-01-21 00:08:53,485 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: phd11-standby.saturn.local/10.110.127.242:8020
2014-01-21 00:08:59,445 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: phd11-nn.saturn.local/10.110.127.195:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-01-21 00:08:59,495 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: phd11-standby.saturn.local/10.110.127.242:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

With the above being discussed, now one may have some understanding of the situation and may handle it. Although one cannot do anything in such a situation and must wait patiently for replaying of edit logs to finish successfully. But, be ready, you may hit issues due to memory/garbage collection process or failure to purge large number of edits files.

 

Comments

Powered by Zendesk