Pivotal Knowledge Base

Follow

How to shutdown a Pivotal HDB 2.0 segment host for maintenance

Environment

Product Version
Pivotal HDB 2.0
Ambari  2.x

Purpose

In previous versions of Pivotal HDB, performing maintenance on a host would be rather complex and could involve some downtime because of the need to maintain data locality within HDFS. In Pivotal HDB 2.x, data locality is no longer a problem, which simplifies host maintenance.

This article gives an overview of steps that can be taken to perform maintenance on a Pivotal HDB host. 

Cause

A pivotal HDB host has a hardware issue that requires the host to be powered off.

Procedure

Some important points to take into account before starting this procedure:

  • Maintenance should be done on one host at a time to avoid data loss.
  • HAWQ will continue to be available during this procedure; however, there may be an impact on performance.
  • Any queries running on the host being shutdown will be aborted.

1. Locate the hostname of the affected host

2. Log into Ambari and click on Hosts and click on the affected host.

3. Under "Host action" select "Stop all components":

4. Wait for all services to stop:

5. Via Single SHell (SSH) power down the affected node:

[root@hawq20dn1 ~]# shutdown -h now
Broadcast message from root@hawq20dn1.lab
(/dev/pts/0) at 3:39 ... The system is going down for halt NOW!
[root@hawq20dn1 ~]# Connection to hawq20dn1 closed by remote host.
Connection to hawq20dn1 closed.
[root@hawq20 ~]#

6. After powering down the node, the "hawq state" may hang for up to 60 seconds and initially will show this:

gpadmin@hawq20dn2 ~]$ hawq state
ssh: connect to host hawq20dn1.lab port 22: No route to host
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:--HAWQ instance status summary
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Master instance = Active
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Master standby = hawq20dn1.lab
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Standby master state = Standby host passive
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total segment instance count from config file = 3
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Segment Status
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total segments count from catalog = 3
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total segment valid (at master) = 3
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total segment failures (at master) = 0
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total number of postmaster.pid files missing = 1
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total number of postmaster.pid files found = 2

7. After 7 minutes, the hawq state will show:

[gpadmin@hawq20dn2 ~]$ hawq state
ssh: connect to host hawq20dn1.lab port 22: No route to host
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:--HAWQ instance status summary
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Master instance = Active
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Master standby = hawq20dn1.lab
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Standby master state = Standby host passive
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Total segment instance count from config file = 3
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:-- Segment Status
20160715:03:43:46:375987 hawq_state:hawq20dn2:gpadmin-[INFO]:------------------------------------------------------ 20160715:02:38:25:362753 hawq_state:hawq20dn2:gpadmin-[INFO]:--   Total segments count from catalog     = 3 20160715:02:38:25:362753 hawq_state:hawq20dn2:gpadmin-[INFO]:--   Total segment valid (at master)       = 2 20160715:02:38:25:362753 hawq_state:hawq20dn2:gpadmin-[INFO]:--   Total segment failures (at master)     = 1 20160715:02:38:25:362753 hawq_state:hawq20dn2:gpadmin-[INFO]:--   Total number of postmaster.pid files missing   = 1

8. Complete maintenance on the affected host.

9. Power up the host.

10. Start all services via Ambari by  going into Ambari / Hosts / Affected host / Host Actions / Start All Components:

NOTE: This article describes starting and stopping HAWQ segments via Ambari, the segments can also be managed using the hawq utility by logging into the node via SSH and running the following commands as user gpadmin: 

hawq start segment
hawq stop segment 

 

Comments

Powered by Zendesk