Pivotal Knowledge Base

Follow

CFOPS does not start monit services on cloud controller vm post Elastic Runtime backup

Environment 

  • Pivotal Cloud Foundry® (PCF) 1.9, 1.10
  • Iaas: vSphere
  • CFOps less than v 3.16

Symptom

A user tried to run a full backup of ERT and after the backup completed none of the "monit" monitored services in the cloud controller where started, therefore, leaving PCF in an unusable state.

The following error was seen in the backup_log:

 2017/08/21 05:45:50 E0821 05:45:50.121512 19139 createCliCommand.go:52] there was an error: failed calling ChangeJobState: failed calling http client: Put https://10.xx.xx.x:25555/deployments/cf-3dxxxxxxxxxxxxxx/jobs/cloud_controller/2?state=started: read tcp 10.xx.xxx.xxx:56560->10.xx.xx.xx:25555: read: connection timed out running backup on elastic-runtime tile:tile

Cause

The logs don't display the request that is timing out:

deployments/cf-3d4xxxxxxxxxx/jobs/cloud_controller/2?state=started

Observations:

  1. New connections to the BOSH Director always work.
  2. Small backups work (<15mins test run was fine)
  3. TCP connections are in idle state between the CFOps machine and the BOSH Director (nginx)

We see two possibilities causes for this:

  1. CFOps reuses the same TCP connection which is timing out.
  2. It's using a new connection but the token is old and no logs about this are published.

Resolution

The fix is to create a new connection each time CFOps machine needs to communicate with the BOSH Director.

The fix is included in CFOps v3.1.7:

https://github.com/pivotalservices/cfops/releases/tag/v3.1.7

Additional Information

Reference: http://www.cfops.io/

Comments

Powered by Zendesk