Pivotal Knowledge Base

Follow

How to Troubleshoot "Standby master start failed, exit" with hawqstandbywatch.py

Environment

Product Version
Pivotal HDB (HAWQ)  2.0
Hortonworks HDP  2.4

Purpose

This article discusses how to use the hawqstandbywatch.py program to assist in isolating the root cause for the HAWQ Standby Master failing to start while the cluster can continue to run. 

Cause

You may have attempted to start the Pivotal HDB cluster and are seeing the following events:

hawqstandbywatch.py:<Standby_Master_Host>:gpadmin-[WARNING]:-syncmaster not running
[ERROR]:-Standby master start failed, exit

Procedure

Follow these steps to call hawqstandbywatch.py directly to assist in finding the root cause:

1. Run the following, as user gpadmin, on the Standby Master host to confirm the hawq_master_directory setting of both Master and Standby Master hosts: 

source /usr/local/hawq/greenplum_path.sh
hawq check --host hdm1 --host hdm2 --hadoop-home /usr/hdp/current/hadoop-client --stdout | grep hawq_master_directory
# Where hdm1=HAWQ Master hostname; hdm2= HAWQ Standby Master hostname 

For example: 

[gpadmin@hdm2 ~]$ hawq check --host hdm1 --host hdm2 --hadoop-home /usr/hdp/current/hadoop-client --stdout | grep hawq_master_directory
hawq_master_directory = /data/hawq/master
hawq_master_directory = /data/hawq/master
[gpadmin@hdm2 ~]$

2. Do they match?  And is the Standby Master deployed in the right path?

The values should match.  The HAWQ Standby Master needs to be deployed using the same path as the Master.  Make sure that this path on the Standby Master has the same ownership and permissions as on Master. 

3. Then run the program directly which validates Standby Master is running or not. Execute as gpadmin on Standby Master host.  Use the value of hawq_master_directory from above:

/usr/local/hawq-2.0.0.0/sbin/hawqstandbywatch.py /data/hawq/master debug 

If it works, you will get a response like the following: 

[gpadmin@hawq20dn1 ~]$ /usr/local/hawq/sbin/hawqstandbywatch.py /data/hawq/master/ debug
Checking standby master status
20160620:17:01:34:026523 hawqstandbywatch.py:hawq20dn1:gpadmin-[INFO]:-Monitoring logs
20160620:17:01:40:026523 hawqstandbywatch.py:hawq20dn1:gpadmin-[INFO]:-checking if syncmaster is running
20160620:17:01:40:026523 hawqstandbywatch.py:hawq20dn1:gpadmin-[INFO]:-syncmaster appears ok, pid 5406
[gpadmin@hawq20dn1 ~]$ 

Comments

Powered by Zendesk