Pivotal Knowledge Base

Follow

HowTo - Enable core generation on a server ?

Goal

This article will describe the steps required to enable core generation. Core's are very useful to debug an application crash, we recommend to enable core generation on the master and segment servers for HAWQ and Greenplum Databases

Solution

Before we proceed, let's first identify the current setting on the server related to core file. You can use the below command to identify if core generation is enabled or not. The below indiates that core file generation is effectively disabled (0 size of core file will be saved).

chaudb$ ulimit -a
core file size (blocks, -c) 0 

Let's move ahead and make the required changes.

a) Make the core file size change persistent

Open /etc/profile file and find the line similar to below

 ulimit -S -c 0 > /dev/null 2>&1

Update it as follows to allow the size of core to be unlimited (putting a limit risks that core file will be truncated).

ulimit -c unlimited >/dev/null 2>&1

b) Define naming convention and location for core files

Open file /etc/sysctl.conf and append the below lines if not available already. (Based on the OS of your server, use the appropriate name of parameters. Below example is for RHEL servers)

kernel.core_uses_pid = 1
kernel.core_pattern = /<directory>/core-%e-%s-%u-%g-%p-%t {Choose the directory where you want to place the core files, their size may range in GB's, so choose it appropriately}

where:
kernel.core_uses_pid = 1 - Appends the coring processes PID to the core file name.
kernel.core_pattern = /<directory>/core-%e-%s-%u-%g-%p-%t - When the application terminates abnormally, a core file should appear in the /tmp. The kernel.core_pattern sysctl controls exact location of core file. You can define the core file name with the following template whih can contain % specifiers which are substituted by the following values when a core file is created:
%% - A single % character
%p - PID of dumped process
%u - real UID of dumped process
%g - real GID of dumped process
%s - number of signal causing dump
%t - time of dump (seconds since 0:00h, 1 Jan 1970)
%h - hostname (same as ’nodename’ returned by uname(2))
%e - executable filename

c) Reload 

Reload the changes made in /etc/sysctl.conf : sysctl -p 

d) Verification

Verify the value for both gpadmin and root user using : ulimit -a.

You must see like below

gpadmin:Fullrack@mdw $ ulimit -a
core file size (blocks, -c) unlimited

Note: Greenplum / HAWQ database needs to be restarted to ensure that these changes are effective.

Comments

  • Avatar
    Ignacio Elizaga

    Just a quick comment on this - gpadmin user will need to be able to write in kernel.core_pattern directory. So we need to make sure that this directory is either owned by gpadmin user or drwxrwxrwt permissions. Otherwise, root user will be able to generate the corefiles but gpadmin will not.

  • Avatar
    Gowri Kothandaraman

    Yes Ignacio You are correct. root user will be able to generate the corefiles but gpadmin will not.Sample Repro

    Root:-

    [root@gpdb_singlenode var]# pwd
    /var

    [root@gpdb_singlenode var]# ls -lrtha core
    total 8.0K
    drwxr-xr-x. 21 root root 4.0K Oct 11 12:59 ..
    drwxr-xr-x 2 root root 4.0K Oct 11 12:59 .

    [root@gpdb_singlenode var]# id
    uid=0(root) gid=0(root) groups=0(root)

    [root@gpdb_singlenode var]# sleep 60 &
    [1] 13729
    [root@gpdb_singlenode var]# kill -SIGABRT 13729

    [root@gpdb_singlenode var]# cd core/
    [1]+ Aborted (core dumped) sleep 60 (wd: /var)
    (wd now: /var/core)

    [root@gpdb_singlenode core]# pwd
    /var/core

    [root@gpdb_singlenode core]# ls -lrtha
    total 116K
    drwxr-xr-x. 21 root root 4.0K Oct 11 12:59 ..
    -rw------- 1 root root 316K Oct 11 13:00 core-sleep-6-0-0-13729-1444593639

    drwxr-xr-x 2 root root 4.0K Oct 11 13:00 .

    GPADMIN:-

    -bash-4.1$ sleep 60 &
    [1] 13785

    -bash-4.1$ kill -SIGABRT 13785

    -bash-4.1$ cd core/
    [1]+ Aborted sleep 60 (wd: /var)
    (wd now: /var/core)

    -bash-4.1$ ls -lrtha
    total 116K
    drwxr-xr-x. 21 root root 4.0K Oct 11 12:59 ..
    -rw------- 1 root root 316K Oct 11 13:00 core-sleep-6-0-0-13729-1444593639

    drwxr-xr-x 2 root root 4.0K Oct 11 13:00 .

    I changed core Directory permission to 777

    [root@gpdb_singlenode var]# chmod 777 core
    [root@gpdb_singlenode var]# stat core
    File: `core'
    Size: 4096 Blocks: 8 IO Block: 4096 directory
    Device: fd00h/64768d Inode: 1051472 Links: 2
    Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2015-10-11 13:05:56.643808523 -0700
    Modify: 2015-10-11 13:00:39.205808517 -0700

    Change: 2015-10-11 13:06:10.517808539 -0700

    ====
    -bash-4.1$ id
    uid=500(gpadmin) gid=500(gpadmin) groups=500(gpadmin)

    -bash-4.1$ kill -SIGABRT 13941

    -bash-4.1$ ls -lrtha
    total 224K
    drwxr-xr-x. 21 root root 4.0K Oct 11 12:59 ..
    -rw------- 1 root root 316K Oct 11 13:00 core-sleep-6-0-0-13729-1444593639
    -rw------- 1 gpadmin gpadmin 316K Oct 11 13:07 core-sleep-6-500-500-13941-1444594022
    drwxrwxrwx 2 root root 4.0K Oct 11 13:07 .
    [1]+ Aborted (core dumped) sleep 60

    ====

    Now gpadmin able to generate core file with pid 13941 after changing the permission of core directory .

Powered by Zendesk