Pivotal Cloud Foundry® Ops Manager Version 1.6 and 1.7
Pivotal Cloud Foundry® (PCF) Redis Version1.5
After an upgrade of the Ops Manager environment from 1.6 to 1.7, bind, unbind and deprovision operations against pre-existing service instances fail.
When looking at the broker’s statefile, located on the broker virtual machine (VM) of the deployment at, /var/vcap/store/cf-redis-broker/statefile.json:, all dedicated instances show up as available and no bindings are reported.
If additionally there were shared-VM instances provisioned prior to the Ops Manager upgrade, they are no longer functioning. The /var/vcap/store/cf-redis-broker/redis-data is now empty.
Bindings to the dedicated instances existing prior to the upgrade usually continue to function. However, their data is at risk as the broker considers these service instances available. Hence, it might bind the service instance to a different app.
When looking at the deployment logs from the Ops Manager, it appears that a new broker VM is created because it has been identified as missing and the previous broker VM is deleted as it is being considered unneeded:
Started creating missing vms > cf-redis-broker-<new>/0
(<new-vm-guid>) Started updating job cf-redis-broker-partition-<new> >
cf-redis-broker-partition-<new>/0 (<new-vm-guid>) (canary) Started deleting unneeded instances cf-redis-broker-<old> >
The above is a known issue affecting Ops Manager upgrades from versions < 1.7 to versions 1.7.0 - 1.7.19. The issue results in the persistent disk of the broker that is detached. BOSH considers the disk orphaned and schedules it for the deletion. Orphaned disks are kept around for a limited amount of time which by default is five days. See BOSH orphaned disks for more information.
Because of the limited lifetime of an orphaned disk, it is essential to perform the following steps as soon as possible to avoid the orphaned disk being permanently deleted:
- Run bosh disks --orphaned to get a list of all the orphaned disks that the bosh director knows about. Identify the one that belongs to the deployment in question and make a note of the Disk CID. Let’s call it <orphaned-disk-cid>.
- Log into the IaaS console and locate the orphaned disk <orphaned-disk-cid>.
- Reattach <orphaned-disk-cid> to the broker VM instance. Take a note of the mount point. Let’s call it <orphaned-disk-mount-point>.
- Bosh ssh onto the broker VM and perform the following commands:
monit stop all
watch monit summary until all are not monitored
exit and sudo su vcap
mv statefile.json statefile.json.bak
cp <orphaned-disk-mount-point>/cf-redis-broker/statefile.json .
exit and sudo su
monit start all
watch monit summary until all are running
- Confirm that the contents of the state file now agree with the state of the Redis deployment prior to the upgrade.
If yes, rm /var/vcap/store/cf-redis-broker/statefile.json.bak.
The service operations should now be successful.
- From the IaaS console, detach <orphaned-disk-cid> from the broker VM instance.
In some rare cases, app instances might fail to get deleted because Cloud Controller believes they are still bound to service instances that no longer exist in reality. To completely remove those apps and the entries for the service instances and the bindings from the cloud controller, use the steps below:
- cf stop <app>
- cf purge-service-instance <service-instance>
- cf delete <app>
This bug is patched in Redis tiles 1.5.26 and resolved in the Ops Manager version 1.7.20.