Pivotal Knowledge Base

Follow

Pivotal HDB query hangs when data loading is in progress

Environment

Product Version
Pivotal HD (PHD)  All versions
Pivotal HDB 1.x (only)

Symptom

At one customer site, it was noticed several times that a simple query would run for a long time or get hung forever when there were several data loading tasks running in the background.

When this issue happened, it was found that several DataNodes were marked as "DEAD" with the "hdfs dfsadmin -report" command. 

After manually restarting the dead DataNodes, the HDB query could complete successfully.

Logs from the dead DataNodes contain the following messages:

2013-12-11 07:14:38,043 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: big1hd03:50010:DataXceiverServer:
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:150)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:136)
        at java.lang.Thread.run(Thread.java:662)
2013-12-11 07:14:38,043 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: big1hd03:50010:DataXceiverServer:
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:150)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:136)
        at java.lang.Thread.run(Thread.java:662)
Cause

The error message "Too many open files" indicates that the number of opened files on the DataNode host has exceeded the limitation of the operating system. 

Regarding the multiple files opened on the system, this was because the customer had created a large number of tables in HDB and all the tables were column-oriented. Since the column data will be stored in the dedicated file for a column-oriented table, it was easy to reach the system limitation (on max open files) when many tables were involved during data loading and query processing.

Resolution

After changing the "max open files" system setting to a larger value (like 65536), this problem has not been seen for a long time.

Summary

Pay attention to the following aspects to avoid the "Too many open files" problem when running HDB tasks.

  • Use the "Column-Oriented" table carefully. Use it only when necessary and real benefits can be gained from it.
  • Set the number of HDB primary instances on each segment to a proper value. Typically, four instances are recommended. The more instances, the more files need to be opened on DataNode during query execution.
  • Set the "max open files" system setting to a relatively large value (at least 65536 for the Data Node which serves HDB).

How to change the "max open files" system setting

As HDFS service (Namenode/Datanode) is typically started by user hdfs, it is preferred to make the change for hdfs only. To do that just modify /etc/security/limits.d/hdfs.conf on each node.

The following settings are suggested in the HDB Installation Guide. But they can be adjusted based on the specific environment.

hdfs - nofile 2900000
hdfs - nproc 131072
 

Comments

Powered by Zendesk