Pivotal Knowledge Base

Follow

Pivotal Cloud Foundry® Redis Service Binding/Unbinding/Deprovisioning Fails after Ops Manager Upgrade

Environment

 Product  Version
 Pivotal Cloud Foundry® Ops Manager    1.6 / 1.7
 Pivotal Cloud Foundry® (PCF) Redis  1.5 

Symptom

After an upgrade of the Ops Manager environment from 1.6 to 1.7, bind, unbind and deprovision operations against pre-existing service instances fail.

When looking at the broker’s statefile, located on the broker VM of the deployment, at, /var/vcap/store/cf-redis-broker/statefile.json:, all dedicated instances show up as available and no bindings are reported.

If additionally there were shared-VM instances provisioned prior to the Ops Manager upgrade, they no longer functioning. The /var/vcap/store/cf-redis-broker/redis-data is now empty.

Bindings to dedicated instances existing prior to the upgrade usually continue to function. However, their data is at risk; as the broker considers these service instances available, it might bind the service instance to a different app.

When looking at the deployment logs from Ops Manager, it appears that a new broker VM is created because it’s identified as missing and the previous broker VM is deleted as it’s considered unneeded:

Started creating missing vms > cf-redis-broker-<new>/0
(<new-vm-guid>) Started updating job cf-redis-broker-partition-<new> >
cf-redis-broker-partition-<new>/0 (<new-vm-guid>) (canary) Started deleting unneeded instances cf-redis-broker-<old> >
cf-redis-broker-<old>/0 (<old-vm-guid>)

Cause

The above is a known issue affecting Ops Manager upgrades from versions < 1.7 to versions 1.7.0 - 1.7.19. The issue results in the persistent disk of the broker that is detached. BOSH considers the disk orphaned and schedules it for deletion. Orphaned disks are kept around for a limited amount of time, which defaults to five days. See BOSH orphaned disks for more information.

Resolution

Because of the limited lifetime of an orphaned disk, it is essential to perform the following steps as soon as possible to avoid the orphaned disk being permanently deleted.

  1. Run bosh disks --orphaned to get a list of all the orphaned disks that the bosh director knows about. Identify the one that belongs to the deployment in question and make a note of the Disk CID. Let’s call it <orphaned-disk-cid>.
  2. Log into the IaaS console and locate the orphaned disk <orphaned-disk-cid>.
  3. Reattach <orphaned-disk-cid> to the broker VM instance. Take a note of the mount point. Let’s call it <orphaned-disk-mount-point>.
  4. Bosh ssh onto the broker VM and perform the following commands:
    sudo su
    monit stop all
    watch monit summary
    until all are not monitored
    exit 
    and sudo su vcap
    cd /var/vcap/store/cf-redis-broker/
    mv statefile.json statefile.json.bak
    cp <orphaned-disk-mount-point>/cf-redis-broker/statefile.json .
    exit 
    and sudo su
    monit start all
    watch monit summary
    until all are running
  5. Confirm that the contents of the statefile now agree with the state of the Redis deployment prior to the upgrade.
    If yes, rm /var/vcap/store/cf-redis-broker/statefile.json.bak.
    The service operations should now be successful.
  6. From the IaaS console, detach <orphaned-disk-cid> from the broker VM instance.

In some rare cases, app instances might fail to deleted because Cloud Controller believes they are still bound to service instances that no longer exist in reality. To completely remove those apps, as well as the entries for the service instances and the bindings from the cloud controller, use the steps below:

  1. cf stop <app>
  2. cf purge-service-instance <service-instance>
  3. cf delete <app>

Additional Information

This bug is patched in Redis tiles 1.5.26 and resolved in Ops Manager version 1.7.20.

 

Comments

Powered by Zendesk