Pivotal Cloud Foundry (PCF) versions 1.6.x and 1.7.x
The upgrade of Pivotal Cloud Foundry may fail due to Diego Brain (ETCD) issues. In this case, cf apps show that the apps have started but running instance is still 0.
$ cf apps Getting apps in org work / space development as user... OK name requested state instances memory disk urls
APP_NAME started 0/1 512M 1G app_name.<app domain>
If check app details with cf app command, it returns error "Instances information unavailable"
$ cf app APP_NAME Showing health and status for app APP_NAME in org work / space development as user... FAILED Server error, status code: 503, error code: 220002, message: Instances information unavailable: response code: 500, response body:
The ETCD (or the ETCD cluster) of Diego Brain might fail to select a leader or synchronize data among the nodes. Diego Brain only keeps temporary data for quick access, therefore, it's safe to delete corrupted ETCD data and restart the ETCD jobs.
In many cases, we found that the ETCD cluster failures in PCF can be corrected by wiping the data from the nodes and resetting them. This process essentially gives the cluster a fresh start and because there is no persistent data stored on the ETCD cluster, the operation is harmless.
Because this process is quick, non-destructive, and has a high success rate for fixing ETCD problems, Pivotal recommends trying this process first, before doing any additional debugging.
To perform this process, follow the instructions in the Failed Deploys, Upgrades, Split-Brain Scenarios, etc section mentioned in this link.
$ monit stop etcd (on all nodes in etcd cluster $ rm -rf /var/vcap/store/etcd/* (on all nodes in etcd cluster) $ monit start etcd (one-by-one on each node in etcd cluster)
If you need assistance with these instructions, please open a support ticket.