Pivotal Knowledge Base

Follow

How to Catch and Generate a Core Dump for Apps Running on Windows Diego Cell

Environment

Product Version
Pivotal Cloud Foundry 1.8.x, 1.9.x
Windows Diego Cell 1.8.2

Purpose

This article explains how to trace and trigger a core dump of a running windows Diego Cell application. We will not go into details regarding how to review the dump once collected. Triggering a core dump is particularly useful when an app that is running for a period of time all of a sudden crashes with errors like "Access Violation" and you want determine why the fault occurred.

What will you Need?

What are the symptoms

In this example we will describe some symptoms that you may see that can not be root caused without a core dump.  Lets assume you have a .NET application running in a windows Diego container and the app all of sudden crashes with this error

2017-02-02T15:51:14.64-0800 [API/0]      OUT App instance exited with guid cf9f685d-2562-4cb4-a9b0-834451b88c13 payload: {"instance"=>"", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"2 error(s) occurred:\n\n* Exited with status -1073741819\n* cancelled", "crash_count"=>4, "crash_timestamp"=>1486079474619372130, "version"=>"276d4084-18ed-48fe-9aee-1b29e6525a8d"}

The interesting part of this error is the application exit status code "Exited with status -1073741819".  Status code -1073741819 in Hex is 0xc0000005 which means "Access Violation".  An Access Violation is usually some form of a memory access fault or some other IO related issue.  We can get more info on the error code if we check the Windows Application Event logs

The error shows that there was an access violation in the iisfreb.dll at offset 0x67da. Using windbg we can open that dll and find the line of code at offset 0x67da

  1. Launch windbg
     'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\windbg.exe' -z C:\Windows\System32\inetsrv\iisfreb.dll
  2. Get the module version and make sure it matches the windows event log
    0:000> lm -v
      CheckSum:         00037232
        ImageSize:        0002C000
        File version:     8.5.9600.16384
        Product version:  8.5.9600.16384
  3. Get the starting offset for the module code so you can inspect offset 0x67da
    0:000> lm
        start             end                 module name
    00000001`80000000 00000001`8002c000   iisfreb    (pdb symbols)          C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\sym\iisfreb.pdb\89CF8B470B1B48BA829E6B4C6A27A7391\iisfreb.pdb
  4. Then fetch the line number by adding the starting offset with the offset reported in the event log. Here we see the function that triggered the violation and the error occurred at line number 0xAE = 174
    0:000> ln 180000000+67da
    (00000001`8000672c)   iisfreb!FREB_REQUEST_CONTEXT::FilterWWWServerAreasAndVerbosity+0xae   |  (00000001`80006844)   iisfreb!FREB_REQUEST_CONTEXT::SerializeAllTraceEventsToLogDataString

In some cases this will be enough info to determine root cause, however if further information is required then we can apply the DebugDiag procedure to trigger a core dump when this access violation occurs

Using DebugDiag tool to trigger a core dump

In this example we will use a small .NET sample app CpuBurner and show how to enable tracing for access violation errors on this process. 

  1. First thing we should check is to make sure we are targeting the right application. We can use cf cli to get the app guid
    $ cf app cpuburner --guid
    91f18699-87f9-43a3-95da-e06a8844795d
  2. Then we can go to windows task maanger -> right click the process -> Open File Location
  3. You will see windows explorer opens a path like this C:\containerizer\BCC2AB46FF4649B4FE\user\app. The directory name of BCC2AB46FF4649B4FE is the username created for this container. Garden windows will create a new user for each app container and all processes run in that container will use this user account.
  4. If we open the acsii text file in location C:\containerizer\BCC2AB46FF4649B4FE\private\properties with notepad we can see the app guid "network.app_id":"91f18699-87f9-43a3-95da-e06a8844795d" matches what we get in the cli and we know we are working with the correct app
  5. Using Task manager we can lookup the process ID of the cpuburner app and make a note of it
  6. Launch "DebugDiag 2 Collections" and use the crash wizard to start tracing the cpuburner process 
  7. Then select to trace on a specific process
  8. Select the CpuBurner.exe that matches the process id we found in taskmaster
  9. On the next prompt click Exceptions -> Add Exception and then populate the Configure Exception form to trigger the "Full Userdump" action when the Access Violation error code 0xc0000005 is encountered
  10. DebugDiag will generate all core dumps and trace logs in the C:\Program Files\DebugDiag\Logs\Crash rule for all instances of CpuBurner.exe directory
  11. Once rule is activate you simply have to run steps to reproduce the fault or wait for the problem to resurface. The developer can use Microsoft tools to analyze the core dump and determine root cause for the fault.

Additional Information 

Some helpful links are below:

  • There are many different releases of DebugDiag for windows and they don't all work in all releases of windows.  Here is a link to Microsoft blog that should have the most recent information available https://blogs.msdn.microsoft.com/debugdiag/
  • Refer to the official site for more windbg information http://www.windbg.org/

 

 

Comments

Powered by Zendesk