Pivotal Knowledge Base

Follow

App fails to start with error message "status code: 500" in Pivotal Cloud Foundry®

Environment

Product Version
Pivotal Cloud Foundry® (PCF) 1.5.x, 1.6.x

Symptom

When trying to start or stage an app, the following message is seen:

starting app <APP-NAME> in org <ORG-NAME> / space <SPACE-NAME> as <USER-NAME>... FAILED Server error, status code: 500, error code: 10001, message: An unknown error occurred.

Upon examining the Cloud Controller VM logs, you see the following error in cloud_controller_ng.log:


message":"Request failed: 500:
{\"code\"=>10001, \"description\"=>\"getaddrinfo: Name or service not known\"

Cause

The /etc/resolv.conf file should have the following order of DNS server addresses:

Line 1 - 127.0.0.1

Line 2 - DNS server address for network in Ops Manager

Line 3 - IP Address of the Director VM

If the order is changed, DNS resolution can fail because the host names sent are internal only. When the other IP address is on the first line and it doesn't resolve the address, the app doesn't return to resolv.conf to try another IP address for a DNS server. It fails and gives the "unknown error" message.

Resolution

Use BOSH to SSH into the cloud_controller-partition VM and enable sudo rights. "cd" to the /var/vcap/sys/log/cloud_controller_ng directory to find the logs; the   cloud_controller_ng.log file is what we will be looking for. 

On running a "tail -f" on the cloud_controller_ng.log, you can see the "500 error" message appear again when the user tries to restart or push an app. The error entry will look something like this:

message":"Request failed: 500: 
{\"code\"=>10001, \"description\"=>\"getaddrinfo: Name or service not known\"

The "getaddrinfo" error description indicates to us that this is an issue with the DNS server. The next step is to get an address that should definitely be reachable. "cd" to the /var/vcap/jobs/cloud_controller_ng/config directory and open the cloud_controller_ng.yml file. 

What we need here is an address that we should definitely be able to reach. You'll see an entry at the top of this file that looks similar to this: "internal_service_hostname: cloud-controller-ng.service.cf.internal". You should be able to run a successful "dig" or "ping" on the cloud-controller-ng.service.cf.internal since it is an internal DNS. If this attempt is unsuccessful, then something is wrong with the local resolver.

From here, "cd" to the /etc/resolv.conf file. Open this file and you should see contents similar to the following:

nameserver 127.0.0.1
nameserver 10.64.0.10
nameserver 10.64.0.11
nameserver 10.64.36.11

The IP addresses above are for the local resolver (127.0.0.1), custom DNS server addresses added through Ops Manager, and the IP address of the Director. The "127.0.0.1" entry should be the first entry, since this is the local resolver. If it is anything else, try switching the position of the "127.0.0.1" entry with the first entry. Save and exit the "resolv.conf" file. Now, try to ping /digcloud-controller-ng.service.cf.internal (or the appropriate entry for "internal_service_hostname" in the previous step).

If you are successful, you have proven there is something wrong with the DNS at the address that was previously the first entry, and you have proven that the local resolver is working again. 

Try to push an app again to verify. 

Note:

This was experienced in a 1.6.9 Elastic Runtime environment.

 

 

Comments

Powered by Zendesk