Pivotal Knowledge Base

Follow

How to Capture goroutine Thread Dumps from Diego and Garden Components

Environment 

Pivotal Cloud Foundry (PCF) 1.9

Purpose

This document provides instructions on how to capture goroutine thread dumps from various Diego and Garden components. This is useful in collecting additional debug data for troubleshooting in case of a hung process or some failure conditions. 

Procedure

Non-destructive option- component debug server

Each of the Diego components operates a debug server serving on the localhost on a port in the 17000-range. This debug server provides endpoints that expose diagnostic information for the Golang runtime in the same manner as stack traces for all of the goroutines that it is running.

The cell rep by default listens on port 17008, but BOSH configuration can override this port. So, it is best to verify that port 17008 is the correct port via netstat listener output.

ssh to the virtual machine (VM) where you want to capture the debug logs and follow the procedure below:

## elevate to root
# sudo -i

## verify rep debug server port
# netstat -nptl | grep rep | grep 127.0.0.1:17
tcp        0      0 127.0.0.1:17008         0.0.0.0:*               LISTEN      3175553/rep

## capture detailed goroutine dump in a location where `bosh logs` will collect it
# curl http://127.0.0.1:17008/debug/pprof/goroutine?debug=2 > /var/vcap/sys/log/rep/goroutine-dump.txt

The garden-runc server, gdn, likewise configures its debug server to listen on port 17005 by default.

# verify garden debug server port
# netstat -nptl | grep gdn | grep 127.0.0.1:17
tcp        0      0 127.0.0.1:17005         0.0.0.0:*               LISTEN      3175610/gdn

## capture detailed goroutine dump in a location where `bosh logs` will collect it
# curl http://127.0.0.1:17005/debug/pprof/goroutine?debug=2 > /var/vcap/sys/log/garden/goroutine-dump.txt

Table of default debug-server ports for Diego and Garden-Runc jobs:

bosh-release Job Process default port
diego rep rep 17008
garden-runc garden gdn 17005
diego auctioneer auctionner 17001
diego bbs bbs 17017
diego file_server file_server 17005
diego locket locket 17018
diego route_emitter route_emitter 17009
diego ssh_proxy ssh-proxy 17016 

 

 

 

 

 

 

 

 

Destructive option- Send SIGQUIT to Golang process

By default, each Golang process will respond to the QUIT signal (signal number 3) by emitting a goroutine stack dump to stderr before exiting. The invocation scripts for the Diego and Garden BOSH jobs then redirects this goroutine dump to the <job_name>.stderr.log  file in the component's /var/vcap/sys/log subdirectory, where it is available for bosh logs to collect.

kill -QUIT $(pidof rep)

The above command will generate the goroutine dump in the /var/vcap/sys/log/rep/rep.stderr.log file.

Comments

Powered by Zendesk