Pivotal Knowledge Base

Follow

How to run TCPDUMP to debug distributed applications

Purpose

  The end user is faced with a number of challenges when collecting TCPDUMP for a distributed application.  This article will help the user narrow down the scope of the tcp collection help the user determine if a TCPDUMP will be useful when troubleshooting a distributed application

Some of Pivotals distributed applications include

  • Pivotal Greenplum
  • Pivotal HDB
  • Pivotal Hadoop

How to overcome the most common issues

Fill up the data partition

Depending on workload tcpdump data size can grow extremely fast.  While troubleshooting one issue you could fill up the OS root partition and cause a very serious issue.  To avoid this we can run a rolling TCPDUMP with parameters that control how much data is collected.

Too much data collected

Along with filling up the data partition we often see users performing a wide open trace with no filters.  If you run a wide open trace on a 10Gig interface during production workload you can end up with Gigs of unnecessary data.  The poor engineer assigned to review the trace has to spend hours parsing and cutting up the trace file to avoid the infamous wireshark out of memory error.  

So please help out the engineer by strategically applying a TCPDUMP Filter and limit the capture length.  Lets say for example you need to collect a tcpdump to debug a HDFS permission issue when a mapreduce job runs.  One could submit a TCPDUMP on all node mangers in the cluster with no filter and collect a huge amount of usless data.  Or the Engineer could apply filter "tcp port 8020 and host <ip of namenode>" to filter only on HDFS metadata traffic from that node.

Too much Data for TCPDUMP to collect

There can be Terabytes of data being transferred in some of these distributed applications.  Running a TCP dump on a highly utilized node may cause the Kernel to discard some of frames before sending them up the stack where TCPDUMP is listening.  In this case the packet would reach the application but TCPDUMP will not be able to collect the information. 

Usually when you run into this situation you may need to look at "When to take a tcpdump" because chances are a dump will not be helpful in this case.  If the node is this much utilized then will most likely be the root of the problem and TCPDUMP is not required then

When to take a tcpdump

Please make sure to ask yourself this question before going down the road of tcpdump collection.  In most cases the application should have enough logging to help the user understand if there was a communication issue on the remote node.  For example if your application is GPFDIST and you set a read timeout of 300 seconds and the segment log says we timed out communicating with GPFDIST then a tcpdump might not help you here.  The tcpdump will simply tell you that GPFDIST didn't respond so segment timed out.  That was something we could infer from the logs.

When ever possible avoid taking TCPDUMP.  The best times to run a tcp collection is to simply rule out network latency, customer firewalls, and to debug what data is being sent to remote node. 

How to execute the rolling TCPDUMP which solves all of the issues

First we need to prepare the node for collection.  In order to run the dump you have to be root because the kernel will not allow non-super users to put a network interface into promiscuous mode, however when tcpdump runs you it will by default write out files as the tcpdump user.   We need to make sure tcpdump user has write permissions to the collection directory 

  1. mkdir -p /data/support/tcpdump
  2. chmod 777 /data/support/tcpdump
  3. verify tcpdump user can write to this new directory
    sudo -u tcpdump touch /data/support/tcpdump/test123

Once the above Pre-reqs are completed then you can start the trace

tcpdump -C 512 -W 8 -s 256 -w /data/support/tcpdump/gpdbgang_issue -i bond0 "tcp port 8020"

Now lets talk about the options.  The above command will collect 8 files with size of 512MB resulting in 4GB of data collected.  TCPDUMP will first write to file1 through file8 and once file8 has reached 512MB it will loop back and overwrite file1 resulting in a rolling tcpdump

the -s option is the packet capture length.  In a typical infrastructure the max packet length is 1514 bytes.  If you are not interested in capturing the data payload of packet then it is a good idea to limit this to about 256 bytes so you are sure to capture the packet header and some data. 

Arg Value Description
-C 512 Max file Size
-W 8 Total Number of Files to Collect
-s 256 Capture Lenght for each packet
-w /data/support/tcpdump/gpdbgang_issue path to save collection.  in this case gpdbgang_issue will be used as prefix for the 8 files collected
-i bond0 Only listen on Interface bond0
Filter "tcp port 8020" Filter on TCP port 8020

Comments

Powered by Zendesk