Pivotal Knowledge Base


clock_global Job Causes High Memory Usage


  • Pivotal Cloud Foundry (PCF) Version 1.12.3
  • Component: Clock-Global


The initial symptom is high memory usage of clock-global. The “dips” in the graph below is when the job or the VM had become so unresponsive that Bosh restarted it.


Logs: “/var/vcap/sys/clock_global/cloud_controller_clock_ctl.log”  

2017-11-10 XX:XX:19+0000] E, [2017-11-10TXX:XX:19.445328 #32029] 
ERROR -- : can't create Thread: Resource temporarily unavailable (ThreadError) [2017-11-10 XX:XX:19+0000]
/var/vcap/data/packages/cloud_controller_ng/109f95034faa3d1b74fb47b3c97b0d7c7a1b9688/cloud_controller_ng/lib/utils/workpool.rb:10:in `initialize' [2017-11-10 XX:XX:19+0000] /var/vcap/data/packages/cloud_controller_ng/109f95034faa3d1b74fb47b3c97b0d7c7a1b9688/cloud_controller_ng/lib/utils/workpool.rb:10:in `new' --------snip----------- [2017-11-10 XX:XX:19+0000] /var/vcap/packages/ruby-2.3/lib/ruby/2.3.0/timeout.rb:106:in `timeout' [2017-11-10 XX:XX:19+0000]

The number of open file handles are high. Nearly all of the open file handles are due to the clock_global process.

 $ lsof | wc -l 

After restarting the file handles are back to normal. On tailing the logs Operators have found a new error message.

==> cloud_controller_clock_ctl.log <==
[2017-11-10 XX:XX:16+0000] 
E, [2017-11-10TXX:XX:16.323780 #26820] ERROR -- : droplet presence (Sequel::ValidationFailed) [2017-11-10 XX:XX:16+0000] /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.49.0/lib/sequel/model/base.rb:1824:in `save' <snip> [2017-11-10 XX:XX:16+0000]
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/newrelic_rpm- `disab

The number of file handles quickly grows at about a rate of 400 new handles per 30 seconds. Until the process starts giving the first error message.


clock-global job tries to run a process called diego_sync.

  1. There may be some orphaned tasks that had been marked as failed, but the sync process could not mark them as failed as the corresponding droplet had been removed.
  2. There is a bug in the sync routine that caused it to continuously spawn threads due to the condition above, however, the process would not clean up the old threads. This bug is introduced in ERT 1.12.3


To update the 'missing-diego-task' task’s state as failed in ccdb.

Steps on clock-global VM:

1. Find the 'missing-diego-task' in the logs of ‘cloud_controller_clock.log’

grep 'missing-diego-task' cloud_controller_clock.log | jq .data.task_guid | sort | uniq

2. Update the ccdb for above tasks. Set the status as FAILED.

Example: script is


task_guids = ["415644dd-1133-49c3-b9d8-60ae87c78b28", "43ae4c03-94ee-40e4-8b71-71ba3a507be1", 
"65ee10cf-07c2-488e-bba9-8af365b5d866", "f3a423d6-7f76-49b9-a2b3-328b4b162810"] TaskModel.where(guid: task_guids).each do |task| droplet_guid = task.droplet_guid droplet = DropletModel.find(guid: droplet_guid) if droplet.nil? puts "cancelling task #{task.guid}" task.this.update(state: "FAILED", failure_reason: "manually cancelled") end end   
cat script.rb | /var/vcap/jobs/cloud_controller_ng/bin/console


  1. Problems with the “clock_global” job are not catastrophic, however, may lead to a degradation of the ability to push apps over time (Days to Weeks depending on how frequently you push apps)
  2. The bug in the sync routine mentioned above may be triggered for different reasons, other than “ERROR -- : droplet presence (Sequel::ValidationFailed)”
  3. It is important to capture the logs before the system runs out of file handles, that is directly after a restart.


Powered by Zendesk