Pivotal Knowledge Base

Follow

Apply Changes fails with Director task ### Error 100: Redis lock lock:deployment:cf-xxx is acquired by another thread

Environment

Product Version
Pivotal Cloud Foundry® (PCF) 1.6 and 1.7
Redis  

Symptoms

When applying changes to PCF after adding a new product your install logs show an error similar to the following:

Director task <task #> Error 100: Redis lock lock:deployment:cf-<xxx> is acquired by another thread

Resolution 

When you see the error, you can run `bosh locks`, which will show the list of current locks. If you get there quick enough, you may be able to see what is holding the lock. The other thing you can do is to run `bosh tasks recent --no-filter`. This will display a list of tasks that has run recently and you should see what was running at the time of the failure.

One of the common causes of this problem is the BOSH resurrector. The resurrector will automatically run and do work in your environment in the background, like restart / recreate VMs. If it's running something while you try to push out changes, you can see this problem because the resurrector will hold the lock while it's working. You might try disabling the resurrector temporarily to see if this helps you get through the upgrade. You can run `bosh vm resurrection off` to disable it completely for the currently selected deployment, to turn it back on, simply run `bosh vm resurrection on`.

Please note that if the resurrector is running it's doing so because it's detecting a problem. Disabling the resurrector should allow you to get past the locking issue, but will likely uncover some other problem. After you disable the resurrector, but before applying changes, you should run `bosh cck` and `bosh instances --ps`, and resolve any errors or failed processes before proceeding with the upgrade.

Impact

There is always a risk with turning off the BOSH resurrector as its purpose is to keep components of PCF up and running and can't do so if turned off. Always remember to run `bosh vm resurrection on` when you are done troubleshooting with the resurrector off otherwise if a component goes down it will stay offline until you manually intervene.

Comments

Powered by Zendesk