Pivotal Knowledge Base

Follow

Hbase region servers fail to come up in crash recover with Immutable Configuration error

Environment

  • PHD 1.x

Symptom

Hbase region servers fail to come up in crash recover with Immutable Configuration error

2015-07-02 06:05:02,273 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=-ROOT-,,0.70236052, starting to roll back the global memstore size.
java.io.IOException: Cannot get log reader
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:721)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3179)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3128)
        at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:631)
        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:547)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4399)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4347)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:101)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.UnsupportedOperationException: Immutable Configuration
        at org.apache.hadoop.hbase.regionserver.CompoundConfiguration.setClass(CompoundConfiguration.java:445)
        at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:249)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
        at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:418)
        at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:385)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2277)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:314)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1747)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:715)
        ... 12 more

Cause

The Immutable configuration error is related to hbase bug HBASE-8372 where hbase is uses a compoundconfiguration class that overrides all the set functions in the hadoop Configuration class

Workaround

The hbase region server recover passes the compoundconfiguration conf object to hadoop client. Then the org.apache.hadoop.ipc.RPC.setProtocolEngine attempts to modify the conf data structor using setClass which is overridden and therefore we get the immutable exception error

 191   public static void setProtocolEngine(Configuration conf,
 192                                 Class<?> protocol, Class<?> engine) {
 193     conf.setClass(ENGINE_PROP+"."+protocol.getName(), engine, RpcEngine.class);
 194   }

The above condition is only true when HDFS HA is not enabled. In the HA case the compoundConfiguration object gets copied into a new Configuration object class resultinga a mutable configuration object which gets passed down to the HDFS client org.apache.hadoop.ipc.RPC.setProtocolEngine

132       // HA case
133       FailoverProxyProvider failoverProxyProvider = NameNodeProxies
134           .createFailoverProxyProvider(conf, failoverProxyProviderClass, xface,
135               nameNodeUri);
136       Conf config = new Conf(conf);

With that in mind a proven workaround in this case is to simply enable HDFS HA for all the hbase region and hbase master services only. This will trick Hbase into thinking HA is enabled even though there is a single namenode in the environment. Allowing hbase region server to go successfully get out of recovery mode. Upon successful recovery the HA related configuration settings can be removed

  1. Take a backup of the /etc/gphd configuration directory on all nodes
  2. Edit the /etc/gphd/hadoop/conf/hdfs-site.xml
    <property> 
      <name>dfs.nameservices</name> 
      <value>${nameservices}</value> 
    </property> 
    
    <property> 
      <name>dfs.ha.namenodes.${nameservices}</name> 
      <value>${namenode1id},${namenode2id}</value> 
    </property> 
    
    <property> 
      <name>dfs.namenode.rpc-address.${nameservices}.${namenode1id}</name> 
      <value>${namenode}:8020</value> 
    </property> 
    
    <property> 
      <name>dfs.namenode.rpc-address.${nameservices}.${namenode2id}</name> 
      <value>${standbynamenode}:8020</value> 
    </property> 
    
    <property> 
      <name>dfs.namenode.http-address.${nameservices}.${namenode1id}</name> 
      <value>${namenode}:50070</value> 
    </property> 
    
    <property> 
      <name>dfs.namenode.http-address.${nameservices}.${namenode2id}</name> 
      <value>${standbynamenode}:50070</value> 
    </property> 
    
    <property> 
      <name>dfs.namenode.shared.edits.dir</name> 
      <value>qjournal://${journalnode}/${nameservices}</value> 
    </property> 
    
    <property> 
      <name>dfs.client.failover.proxy.provider.${nameservices}</name> 
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
    </property> 
    
    <property> 
      <name>dfs.ha.fencing.methods</name> 
      <value>
      sshfence
      shell(/bin/true)
      </value> 
    </property> 
    
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hdfs/.ssh/id_rsa</value>
    </property>
    
    <property> 
      <name>dfs.journalnode.edits.dir</name> 
      <value>${journalpath}</value> 
    </property> 
    
    <!-- Namenode Auto HA related properties --> 
    <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
    <!-- END Namenode Auto HA related properties -->
  3. Edit the /etc/gphd/hadoop/conf/core-site.xml
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://${nameservices}</value> 
      <description>The name of the default file system.  A URI whose
      scheme and authority determine the FileSystem implementation.  The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class.  The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
    </property>
    
    
    <property>
       <name>ha.zookeeper.quorum</name>
       <value>${zookeeper-server}:${zookeeper.client.port}</value>
     </property> 
  4. Edit the /etc/gphd/hadoop/conf/yarn-site.xml
    <property>
        <name>mapreduce.job.hdfs-servers</name>
        <value>hdfs://${nameservices}</value>
    </property>
  5. Edit the /etc/gphd/hbase/conf/hbase-site.xml
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://${nameservices}/apps/hbase/data</value>
        <description>The directory shared by region servers and into
        which HBase persists.  The URL should be 'fully-qualified'
        to include the filesystem scheme.  For example, to specify the
        HDFS directory '/hbase' where the HDFS instance's namenode is
        running at namenode.example.org on port 9000, set this value to:
        hdfs://namenode.example.org:9000/hbase.  By default HBase writes
        into /tmp.  Change this configuration else all data will be lost
        on machine restart.
        </description>
    </property>
  6. Distribute the configuration changes to all hbase master and region server nodes
  7. Restart hbase services
  8. Restore original configuration
  9. Restart hbase services and confirm issue is resolved

Fix

Upgrade to PHD 3.0 which includes HBASE version 0.98.4 or permanently enable HDFS HA to stop this issue from occurring in the future

Comments

Powered by Zendesk