Pivotal Cloud Foundry 1.10.12+
Customer is using Diego cells not deployed by Elastic Runtime, such as with Isolation segments or using OSS deployment.
Running df -i reports inode usage of 100%. (or high inode utilization)
Diego deployment manifest should have
--cleanup-process-dirs-on-wait should be on garden when it starts:
/var/vcap/data/jobs/garden/4456fe41ab6291aefe82ef966103d435676f45ca/bin/garden_ctl: --cleanup-process-dirs-on-wait \
You should see this flag
--cleanup-process-dirs-on-wait on gdn process when started :
ps -ef. | grep -i gdn root 514382 514381 2 Nov18 ? 14:24:19 /var/vcap/packages/guardian/bin/gdn server --skip-setup --bind- ... --cleanup-process-dirs-on-wait
If this is not set then deployment manifest should be updated to include:
Application crashes with error:
runc exec: exit status 1: exec failed: open /var/vcap/data/garden/depot/... .../.pidfile: No space left on device
A new garden boolean cleanup_process_dirs_on_wait was introduced in the release: https://github.com/cloudfoundry/garden-runc-release/tree/v1.5.0 - this flag by default is set to false unless explicitly set in deployment. This option being disabled will leave behind stale directories which eventually lead to inodes being exhausted.
Note: Versions of Elastic Runtime that are lower than 1.10.12 will not have this boolean as it uses older than 1.5.0 garden release. (these systems will not be affected by this problem) Refer to release notes for Garden versions packaged with ERT: https://docs.pivotal.io/pivotalcf/1-10/pcf-release-notes/runtime-rn.html
It will be necessary to update deployment manifest with boolean
Note: that deployment manifest may vary depending what type of manifest has deployed garden. You should check all manifests for garden and verify that they have cleanup_process_dirs_on_wait set to true.
Once the boolean value is set then execute `bosh deploy <deployment name>` in order to implement the change.
Another option is to
bosh recreate Diego cells periodically until the fix is available.
Please note if you make any changes to the configuration in Ops Manager, this will overwrite manual changes to deployment files.
This issue will be fixed in an upcoming release of PCF Isolation Segment.