Pivotal Knowledge Base

Follow

Pivotal Cloud Foundry® Redis Service Binding/Unbinding/Deprovisioning Fails after Ops Manager Upgrade

Environment

Pivotal Cloud Foundry® Ops Manager Version 1.6 and 1.7
Pivotal Cloud Foundry® (PCF) Redis Version1.5 

Symptom

After an upgrade of the Ops Manager environment from 1.6 to 1.7, bind, unbind and deprovision operations against pre-existing service instances fail.

When looking at the broker’s statefile, located on the broker virtual machine (VM) of the deployment at, /var/vcap/store/cf-redis-broker/statefile.json:, all dedicated instances show up as available and no bindings are reported.

If additionally there were shared-VM instances provisioned prior to the Ops Manager upgrade, they are no longer functioning. The /var/vcap/store/cf-redis-broker/redis-data is now empty.

Bindings to the dedicated instances existing prior to the upgrade usually continue to function. However, their data is at risk as the broker considers these service instances available. Hence, it might bind the service instance to a different app.

When looking at the deployment logs from the Ops Manager, it appears that a new broker VM is created because it has been identified as missing and the previous broker VM is deleted as it is being considered unneeded:

Started creating missing vms > cf-redis-broker-<new>/0
(<new-vm-guid>) Started updating job cf-redis-broker-partition-<new> >
cf-redis-broker-partition-<new>/0 (<new-vm-guid>) (canary) Started deleting unneeded instances cf-redis-broker-<old> >
cf-redis-broker-<old>/0 (<old-vm-guid>)

Cause

The above is a known issue affecting Ops Manager upgrades from versions < 1.7 to versions 1.7.0 - 1.7.19. The issue results in the persistent disk of the broker that is detached. BOSH considers the disk orphaned and schedules it for the deletion. Orphaned disks are kept around for a limited amount of time which by default is five days. See BOSH orphaned disks for more information.

Resolution

Because of the limited lifetime of an orphaned disk, it is essential to perform the following steps as soon as possible to avoid the orphaned disk being permanently deleted:

  1. Run bosh disks --orphaned to get a list of all the orphaned disks that the bosh director knows about. Identify the one that belongs to the deployment in question and make a note of the Disk CID. Let’s call it <orphaned-disk-cid>.
  2. Log into the IaaS console and locate the orphaned disk <orphaned-disk-cid>.
  3. Reattach <orphaned-disk-cid> to the broker VM instance. Take a note of the mount point. Let’s call it <orphaned-disk-mount-point>.
  4. Bosh ssh onto the broker VM and perform the following commands:
    sudo su
    monit stop all
    watch monit summary
    until all are not monitored
    exit 
    and sudo su vcap
    cd /var/vcap/store/cf-redis-broker/
    mv statefile.json statefile.json.bak
    cp <orphaned-disk-mount-point>/cf-redis-broker/statefile.json .
    exit 
    and sudo su
    monit start all
    watch monit summary
    until all are running
  5. Confirm that the contents of the state file now agree with the state of the Redis deployment prior to the upgrade.
    If yes, rm /var/vcap/store/cf-redis-broker/statefile.json.bak.
    The service operations should now be successful.
  6. From the IaaS console, detach <orphaned-disk-cid> from the broker VM instance.

In some rare cases, app instances might fail to get deleted because Cloud Controller believes they are still bound to service instances that no longer exist in reality. To completely remove those apps and the entries for the service instances and the bindings from the cloud controller, use the steps below:

  1. cf stop <app>
  2. cf purge-service-instance <service-instance>
  3. cf delete <app>

Additional Information

This bug is patched in Redis tiles 1.5.26 and resolved in the Ops Manager version 1.7.20.

 

Comments

Powered by Zendesk