Pivotal Knowledge Base

Follow

Help! My app is crashing and I don't see any thing in the logs

Symptoms

When pushing an application to PWS, the application uploads and stages fine, but fails to start.  The status remains either starting or down, and may flip between the two until cf eventually fails with this error message.

...
0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 starting 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 down 0 of 1 instances running, 1 failing FAILED Start unsuccessful

Furthermore, when you run cf logs --recent <app> to check into why the application is failing you see no application specific logs.  You would just see the following messages from PWS.

2015-06-10T16:37:07.55-0400 [DEA/14]     OUT Starting app instance (index 0) with guid e6ea4415-fa97-439d-b702-7f6c2a5525f0
2015-06-10T16:37:10.78-0400 [App/0]      ERR
2015-06-10T16:37:10.78-0400 [App/0]      OUT
2015-06-10T16:37:10.84-0400 [DEA/14]     ERR Instance (index 0) failed to start accepting connections
2015-06-10T16:37:10.85-0400 [API/5]      OUT App instance exited with guid e6ea4415-fa97-439d-b702-7f6c2a5525f0 payload: {"cc_partition"=>"default", "droplet"=>"e6ea4415-fa97-439d-b702-7f6c2a5525f0", "version"=>"26a99066-2cc0-4e6a-b1ee-197a9eb9f8e3", "instance"=>"39a6435540c947aa848ed9300de9ed95", "index"=>0, "reason"=>"CRASHED", "exit_status"=>0, "exit_description"=>"failed to accept connections within health check timeout", "crash_timestamp"=>1433968630}
2015-06-10T16:37:10.85-0400 [API/3]      OUT App instance exited with guid e6ea4415-fa97-439d-b702-7f6c2a5525f0 payload: {"cc_partition"=>"default", "droplet"=>"e6ea4415-fa97-439d-b702-7f6c2a5525f0", "version"=>"26a99066-2cc0-4e6a-b1ee-197a9eb9f8e3", "instance"=>"39a6435540c947aa848ed9300de9ed95", "index"=>0, "reason"=>"CRASHED", "exit_status"=>0, "exit_description"=>"failed to accept connections within health check timeout", "crash_timestamp"=>1433968630}

Environment

This issue only happens for applications deployed to DEAs.  In the case of Pivotal Web Services, we are now using Diego and so this is no longer an issue.  This article is being retained for historical purposes only.

Cause

In some cases, when an application starts and dies very quickly, the system does not have enough time to attach and collect the logs (i.e. the process goes away too fast).  This is a known race condition and will be addressed in Diego.

Resolution

To make sure that your application logs are visible, you need to slow things down and make sure the application does not crash before the system can attach the logger.  There are two ways that you can do this.

  1. Obtain the current command being used to start your application and override it with the -c argument to cf push.  This argument allows you to specify a custom start command and with that you can inject a two second pause which is sufficient time for the logger to attach and capture logs.

    Ex:  cf push <app-name> -c 'sleep 2 && <normal-start-command'>

    If you are not currently specifying a start command, with the -c argument to cf push, the build pack will automatically determine the start command to use.  When this happens, like with Java apps you may not know what command is being used to start your application.  You can locate the start command by running the following command.

    Ex:  Unix / Linux

    CF_TRACE=true cf app <app-name> | grep '"detected_start_command": '

    Ex: Windows

    set CF_TRACE=true
    cf app <app-name> > app-info.txt

    Then open app-info.txt with a text editor, like Notepad, and search for "detected_start_command":

  2. Create a .profile.d script that pauses for two seconds and include that with your application.  Any scripts that are included in the <project-root>/.profile.d directory of an application will be run in the runtime environment prior to your application.  This means that you can create that directory and include a script which pauses for two seconds to provide sufficient time for the logger to attach and capture output from your application.

    The first step is to create the .profile.d directory in the root of your project.  This is the same directory from which you push your application and possibly also where you have your manifest.ymlfile.  Here's an example script that you can use.

    Ex:  sleep.sh

    #/bin/bash
    echo "Sleeping for two seconds..."
    sleep 2

    If you are deploying a Java application, the process is slightly different.  Because Java applications are pushed as a JAR or WAR file, you need the .profile.d directory to be created at the root of your JAR or WAR file.  Exactly how you do this depends on how you build your JAR or WAR file, but with Maven or Gradle, it would mean that you'd create the directory src/main/webapp/.profile.d and put your script there.  When you build the JAR or WAR file, Maven or Gradle will take that directory and put it in the root of your JAR or WAR file.  You can confirm this by running jar tf <your-jar-or-war-file> which will list the table of contents and should show the directory and script.

Additional Information

When your application fails, the cf logs output will indicate the exit status of the application.  This corresponds to the Linux exit code and in some cases can be enough to tell you what's wrong with your application.  You can take the exit status code that is returned by PWS and look it up here.  If it's one of the standard exit codes like 127 (command not found) then you may have all the information that you need to understand why the app is failing to start.

 

Comments

Powered by Zendesk