Pivotal Knowledge Base

Follow

How to Deploy, Start and Stop mem_watcher utility

Environment

 Product
 GPDB or HAWQ
 OS: Centos or Linux
 DCA or software-only

Purpose

In this article, we will learn how to deploy, start and stop the mem_watcher daemon on all the cluster nodes.

The mem_watcher utility is a daemon that runs on all the servers of GPDB/HAWQ cluster and tracks the memory usage of each process on GPDB/HAWQ every minute (by default) i.e. it collects PS output every minute. It is a low impact process that consumes only 4 MB of memory. You should expect that it will generate approximately 30 MB of data over a 24-hour period.

Resolution

Download

Download the attached script and move it to the GPDB/HAWQ master server under:

  • /home/gpadmin/mem_watcher/mem_watcher

Start

Here are the instructions to start the mem_watcher daemon:

  1. Log on to the host where mem_watcher was installed (usually a 'master' host) as user gpadmin.
  2. Move to the mem_watcher directory:
    cd /home/gpadmin/mem_watcher
  3. Create a hostmap.txt file with all cluster hosts, along with the absolute path you want to use to maintain the mem_watcher output.

    Example hostmap.txt file like:

    master1.phd.local:/home/gpadmin/mem_watcher/working
    slave1.phd.local:/home/gpadmin/mem_watcher/working
    slave2.phd.local:/home/gpadmin/mem_watcher/working
    slave3.phd.local:/home/gpadmin/mem_watcher/working
  4. In case you have many segment host a simple script like below should help in creating a hostmap , provide the "location" parameter value.
    export location=<location-to-collect-the-memory-stats>
    psql -d template1 -Atc "select distinct hostname||':$location' from gp_segment_configuration" > hostmap.txt
  5. On each of the cluster hosts above, create the working directory (above) as user 'gpadmin'. for example:
    gpssh -f all_hosts "mkdir -p /home/gpadmin/mem_watcher/working"
  6. Start mem_watcher using
    ./mem_watcher -f ./hostmap.txt
    
  7. Verify the mem_watcher demons are running using the below command
    gpssh -f <hostfile> "ps -ef |grep mem_watcher|grep -v grep"

Stop

This section includes instructions to stop all the mem_watcher processes, retrieve all data files to the current directory and remove the work files on all the hosts.

Note: Ensure that there is enough space in the current directory to retrieve the data files.

  1. Log on to the host where mem_watcher was deployed from (usually a 'master' host) as user gpadmin.
  2. Navigate to the mem_watcher directory:
    cd /home/gpadmin/mem_watcher
  3. Stop mem_watcher:
    ./mem_watcher -f ./hostmap.txt --stop
  4. The mem_watcher logs should be collected to the current directory.

Additional Information

Reference:  Follow the guide to understand the mem_watcher output.

Known Issues

  • If you have executed the mem_watcher once, reexecuting the mem_watcher with the same location in the hostmap would indicate the error.
 < cannot create directory: ..../gpsupport_reswatch ' file exists >

Change the directory location in the hostmap to avoid the issue.

  • If you execute the mem_watcher from any different location other than location where the mem_watcher executables are present (for example, /usr/local/pivotal-support/bin) you will encounter the following error:
< mem_watcher: No such file or directory .... Error when trying to copy script to .... >

Ensure you are on the bin directory, where the mem_watcher is present.

Comments

  • Avatar
    RadosÅ_aw Michlewski

    Doesn't work with pivotal-support_1.4:

    Wrong path in: source /usr/local/greenplumb-db/greenplumb_path.sh
    - should be source /usr/local/greenplumb-db/greenplum_path.sh

    Wrong path to enable mem_watcher: ./mem_watcher -f /path/to/hostmap.txt
    - should be:
    cd /usr/local/pivotal-support
    ./bin/mem_watcher

    and last but not least it fails to run:
    [gpadmin@mdw-db bin]$ ./mem_watcher -f /tmp/pivotal-support_1.4/hostfile
    Traceback (most recent call last):
    File "./mem_watcher", line 195, in <module>
    main()
    File "./mem_watcher", line 191, in main
    launchProcess(*mapping)
    TypeError: launchProcess() takes exactly 2 arguments (1 given)

  • Avatar
    Faisal Ali

    Hello Radoslaw ,

    There is a error in the way you have set your environment and the hostfile you have provided for the mem_watcher to execute.

    Regarding the environment

    I have corrected the error in the greenplum_path home and regarding the path you had to navigate to execute mem_watcher you will have to source the package environment as well

    The command

    source /usr/local/pivotal-support/support-env.sh
    

    Once you source it , you can execute the mem_watcher from any directory

    Regarding the Hostfile

    The hostfile is also not the typical hostfile you find in the gpconfig or the one you use to connect via gpssh, here you will need to modify the hostfile in the form of the below contents/format

    <hostname>:<location>
    

    and this should be mapped to all the hosts that you wish the mem_watcher to run , typically all the segment host servers

    You can have a look at the hostmap example in the document/article above.

    Hope that helps

    Thanks

  • Avatar
    Vincent

    I'm constantly getting the error
    < mem_watcher: No such file or directory ....
    Error when trying to copy script to .... >

    Yet, if I try to run the command twice in a row, then I get the new error
    < cannot create directory: ..../gpsupport_reswatch ' file exists >

    So clearly the directory does exist (on all servers, though it isn't getting past the master). I've double checked this over and over, cannot seem to find a simple answer. Any help?

    Thank you in advance.

  • Avatar
    Faisal Ali

    @ Vincent :

    Regarding 1st error

    Im not sure what command was run by you to receive the error message

    < mem_watcher: No such file or directory .... 
    Error when trying to copy script to .... >
    

    But looking at the error message it seems to me that when you executed the command you either didnt source the path or you were on a different directory as the error states there is no command called mem_watcher.

    so the recommendation would be to

    • Source the path
    • check now you have the mem_watcher on the path list by
    which mem_watcher
    

    OR

    • Go to the location where you installed the support package
    • and run the command
    ./mem_watcher .....
    

    Regarding 2nd error

    And related to the error regarding the directory already exists , When you execute the command twice it has created the directory initially and second time it fails due to the directory already exists

    Its a simple fix as the python of mem_watcher is failing due to unix call ( as we are trying to avoid multiple demons of the mem_watcher running and dumping files on the same location ) , if you want to avoid the issue

    • Ensure there is no mem_watcher process running
    gpssh -f hostfile
    ps -ef | grep mem_watcher
    
    • open the mem_watcher script using vi and replace the below line of code.
        try:
            subprocess.check_call("ssh -T %s 'mkdir %s'" % (host, dest_dir), shell=True)
        except subprocess.CalledProcessError, e:
            err = "Error when trying to create directory: " + dest_dir + " on host: " + host
            print >> sys.stderr, err
    

    with

        try:
            subprocess.check_call("ssh -T %s 'mkdir -p  %s'" % (host, dest_dir), shell=True)
        except subprocess.CalledProcessError, e:
            err = "Error when trying to create directory: " + dest_dir + " on host: " + host
            print >> sys.stderr, err
    

    i.e we are just replacing the mkdir with "mkdir -p" , this should fix your problem ..

    NOTE : when you change the code this will no longer check and warn you on the existing of the file , so always before you run it (mem_watcher) stop old mem_watcher demons and then relaunch them again..

    Thanks
    Faisal

  • Avatar
    Vincent

    Hi Faisal,

    Thank you for your response! Always nice to see someone on the Pivotal forums willing to help.

    Running down your list of advice above:

    • pivotal-support/support_env.sh is sourced for gpadmin
    • 'which mem_watcher' shows the full path to the command script.
    • 'gpssh -f hostfile... etc' shows 0 processes running on any servers.
    • I changed the above 'mkdir' to 'mkdir -p' and now I can run the command repeatedly to only receive the initial error set as follows:

    < mem_watcher: No such file or directory ....
    Error when trying to copy script to .... >

    The command I am now using to run this is as follows:

    /usr/local/pivotal-support/bin/mem_watcher -f /usr/local/pivotal-support/hostfile.txt

    where hostfile.txt looks like:

    mdw:/home/gpadmin/mem_watcher/working
    sdw1:/home/gpadmin/mem_watcher/working
    etc...

    Regards,
    Vincent.

  • Avatar
    Faisal Ali

    Hi Vincent

    I think i got from where the error

    < mem_watcher: No such file or directory .... 
    Error when trying to copy script to .... 
    

    is coming from , it invoked by the code

      try:
            subprocess.check_call('scp -q mem_watcher %s:%s' % (host, dest_dir), shell=True)
        except subprocess.CalledProcessError, e:
            err = 'Error when trying to copy script to %s:%s' % (host, dest_dir)
            print >> sys.stderr, err
            sys.exit(1)
    

    I.e its scp'ing the mem_watcher to the hostmaps , but it assumes that mem_watcher is on the working directory that you are currently in (its not dynamic), that is if you are on /home/gpadmin it assumes mem_watcher is on /home/gpadmin and doesn't navigate to the location based on env

    So in order to fix it , here are some of the recommendation.

    • Navigate to the bin directory
    • and execute the command there and everything should work fine as per the example below
    [gpadmin@mdw pivotal-support]$ pwd
    /usr/local/pivotal-support
    
    [gpadmin@mdw pivotal-support]$ source support-env.sh
    
    [gpadmin@mdw pivotal-support]$ which mem_watcher
    /usr/local/pivotal-support/bin/mem_watcher
    
    [gpadmin@mdw pivotal-support]$ cat hostfile.txt
    mdw:/home/gpadmin/working
    sdw1:/home/gpadmin/working
    sdw2:/home/gpadmin/working
    
    [gpadmin@mdw pivotal-support]$ cd bin
    [gpadmin@mdw bin]$ /usr/local/pivotal-support/bin/mem_watcher -f /usr/local/pivotal-support/hostfile.txt
    
    [gpadmin@mdw bin]$ gpssh -h mdw -h sdw1 -h sdw2
    => ps -ef | grep mem_wa | grep -v grep
    [sdw1] gpadmin  789614      1  0 05:25 ?        00:00:00 python /home/gpadmin/working/gpsupport_reswatch/mem_watcher --daemon -d /home/gpadmin/working/gpsupport_reswatch
    [ mdw] gpadmin   63099      1  0 05:25 ?        00:00:00 python /home/gpadmin/working/gpsupport_reswatch/mem_watcher --daemon -d /home/gpadmin/working/gpsupport_reswatch
    [sdw2] gpadmin    2218      1  0 05:25 ?        00:00:00 python /home/gpadmin/working/gpsupport_reswatch/mem_watcher --daemon -d /home/gpadmin/working/gpsupport_reswatch
    =>
    

    OR

    Fix the code to use dynamic scanning of the mem_watcher file

    i.e edit the mem_watcher code from

        try:
            subprocess.check_call('scp -q mem_watcher %s:%s' % (host, dest_dir), shell=True)
        except subprocess.CalledProcessError, e:
            err = 'Error when trying to copy script to %s:%s' % (host, dest_dir)
            print >> sys.stderr, err
            sys.exit(1)
    

    to this

          try:
            subprocess.check_call('scp -q %s %s:%s' % (__file__, host, dest_dir), shell=True)
        except subprocess.CalledProcessError, e:
            err = 'Error when trying to copy script to %s:%s' % (host, dest_dir)
            print >> sys.stderr, err
            sys.exit(1)
    

    And this should help in picking up the location of mem_watcher file from anywhere

    Hope that helps

    Thanks
    Faisal

  • Avatar
    Vincent

    Hello gain Faisal,

    You, sir, are a legend. That code-fix above works an absolute charm! I'm confused as to why it isn't like that in the first place however, seems odd to me.

    Either way, my problem is resolved and I ran gpssh command to check and it is running fine on all servers.

    Thanks again,
    Vincent.

  • Avatar
    Faisal Ali

    Thanks Vincent for the compliment : )

    I have notified the bug to the developer of the mem_watcher tool , hopefully he should have this integrated in the next version of the mem_watcher.

    Thanks

Powered by Zendesk