Pivotal Knowledge Base

Follow

Pivotal HDB Resource Manager terminated by signal 11: Segmentation fault

Environment

Product Version
Pivotal HDB (Pivotal Hadoop Database) 2.0.0
Pivotal HDP (Hortonworks Data Platform) 2.3, 2.4

Symptom

  • Pivotal HDB never starts up, so cannot log into PSQL.
  • Logs contain error messages like the following: 
[gpadmin@piv-master pg_log]$ ps -eaf | grep res
root        132      2  0 14:17 ?        00:00:00 [usbhid_resumer]
ams        3220      1  0 14:27 ?        00:00:05 /usr/bin/python2.6 /usr/lib/python2.6/site-packages/resource_monitoring/main.py start
gpadmin    8721      1  1 14:50 ?        00:00:01 /usr/local/hawq-2.0.0.0/bin/postgres -D /data/hawq/master -i -M master -p 5432 --silent-mode=true
gpadmin    8726   8721  0 14:50 ?        00:00:00 postgres: port  5432, master logger process
gpadmin    9062   8721  0 14:51 ?        00:00:00 postgres: port  5432, stats collector process
gpadmin    9063   8721  0 14:51 ?        00:00:00 postgres: port  5432, writer process
gpadmin    9064   8721  0 14:51 ?        00:00:00 postgres: port  5432, checkpoint process
gpadmin    9065   8721  0 14:51 ?        00:00:00 postgres: port  5432, seqserver process
gpadmin    9066   8721  0 14:51 ?        00:00:00 postgres: port  5432, WAL Send Server process
gpadmin    9067   8721  0 14:51 ?        00:00:00 postgres: port  5432, DFS Metadata Cache process
gpadmin    9072   8721  0 14:51 ?        00:00:00 postgres: port  5432, master resource manager
gpadmin    9116   8544  0 14:52 pts/1    00:00:00 grep res
[gpadmin@piv-master pg_log]$ tail -f hawq-2016-09-05_145019.csv | grep "signal 11"
2016-09-05 14:52:22.317039 CEST,,,p8721,th1884252288,,,,0,,,seg-10000,,,,,"LOG","00000","resourcemanager process (PID 9072) was terminated by signal 11: Segmentation fault",,,,,,,0,,"postmaster.c",4748,
2016-09-05 14:52:22.317048 CEST,,,p8721,th1884252288,,,,0,,,seg-10000,,,,,"DEBUG2","00000","server process (PID 9072) was terminated by signal 11: Segmentation fault",,,,,,,0,,"postmaster.c",4748,
2016-09-05 14:52:22.317065 CEST,,,p8721,th1884252288,,,,0,,,seg-10000,,,,,"LOG","00000","server process (PID 9072) was terminated by signal 11: Segmentation fault",,,,,,,0,,"postmaster.c",4748,
  • /var/log/messages contains messages like this: 
Sep  5 12:02:25 piv-master kernel: postgres[225377]: segfault at 0 ip 00000000008f79d4 sp 00007fff99d2dd70 error 4 in postgres[400000+83c000]
Sep  5 12:02:42 piv-master kernel: postgres[225401]: segfault at 0 ip 00000000008f79d4 sp 00007fff99d2dd70 error 4 in postgres[400000+83c000]
Sep  5 12:02:53 piv-master kernel: postgres[225537]: segfault at 0 ip 00000000008f79d4 sp 00007fff99d2dd70 error 4 in postgres[400000+83c000]
Sep  5 12:02:55 piv-master kernel: postgres[225562]: segfault at 0 ip 00000000008f79d4 sp 00007fff99d2dd70 error 4 in postgres[400000+83c000]
Sep  5 12:03:12 piv-master kernel: postgres[225586]: segfault at 0 ip 00000000008f79d4 sp 00007fff99d2dd70 error 4 in postgres[400000+83c000]

Cause

Reverse hostname lookup does not work:

[root@piv-master ~]# nslookup 192.168.211.101
Server: 192.168.211.100
Address: 192.168.211.100#53
** server can't find 101.211.168.192.in-addr.arpa.: NXDOMAIN
[root@piv-master ~]#

Resolution

Option 1 - Recommended:

Update DNS servers to allow reverse DNS lookup, so that IPs resolve correctly to a hostname.

Option 2:

Update /etc/hosts to contain an entry for each of the HAWQ hosts in the cluster, an example is shown below:

192.168.211.100     piv-master.local               piv-master
192.168.211.104     piv-standbymaster.local        piv-standymaster
192.168.211.102     piv-segment1.local             piv-segment1
192.168.211.101     piv-segment2.local             piv-segment2
192.168.211.105 piv-segment3.local piv-segment3

Internal Comments

Note: https://jira-pivotal.atlassian.net/browse/GPSQL-3301 has been created to improve the logging around this error.

 

Comments

Powered by Zendesk