Pivotal Knowledge Base

Follow

Monit Timeout Error with "Stopping Monitored Services" in Service Metrics

Environment

  • Redis for Pivotal Cloud Foundry (PCF) 1.9.4
  • Metrics 1.5.11

Symptom

Updating Redis and "Apply Changes" of the Redis for PCF v1.9.4 failed with the following error:

Error Message

===== 2017-11-21 07:10:55 UTC Running "bundle exec bosh -n deployment /var/tempest/workspaces/default/deployments/p-redis-3ddb9f19857867f94aaa.yml" 
Deployment set to '/var/tempest/workspaces/default/deployments/p-redis-3ddb9f19857867f94aaa.yml' 
===== 2017-11-21 07:10:55 UTC Finished "bundle exec bosh -n deployment /var/tempest/workspaces/default/deployments/p-redis-3ddb9f19857867f94aaa.yml"; Duration: 0s; Exit Status: 0 
===== 2017-11-21 07:10:55 UTC Running "bundle exec bosh -n deploy" 
Acting as client 'ops_manager' on deployment 'p-redis-3ddb9f19857867f94aaa' on 'microbosh-044414e9013b19493445' 
RSA 1024 bit CA certificates are loaded due to old openssl compatibility 
Getting deployment properties from director...

Detecting deployment changes

---------------------------- 
instance_groups: 
- name: redis-on-demand-broker 
jobs: 
- name: syslog_forwarder 
properties: 
syslog: 
migration: 
- disabled: "" 
+ address: "" 
+ port: "" 
+ transport: "" 
- name: broker 
properties: 
service_catalog: 
global_properties: 
syslog: 
migration: 
- disabled: "" 
+ address: "" 
+ port: "" 
+ transport: "" 
- name: cf-redis-broker 
jobs: 
- name: syslog_forwarder 
properties: 
syslog: 
migration: 
- disabled: "" 
+ message_format: "" 
+ address: "" 
+ port: "" 
+ transport: "" 
- name: dedicated-node 
jobs: 
- name: syslog_forwarder 
properties: 
syslog: 
migration: 
- disabled: "" 
+ message_format: "" 
+ address: "" 
+ port: "" 
+ transport: ""

Deploying

---------
Director task 80792 
Started preparing deployment > Preparing deployment. Done (00:00:01)
Started preparing package compilation > Finding packages to compile. Done (00:00:00)

Started updating instance redis-on-demand-broker > redis-on-demand-broker/9e01681e-1ca8-4cc0-b068-392ca3966f47 (0) (canary). Done (00:01:06) 
Started updating instance cf-redis-broker > cf-redis-broker/2f6c9bcc-ee49-40b7-b9c8-9fab2efde808 (0) (canary). Done (00:01:10) 
Started updating instance dedicated-node 
Started updating instance dedicated-node > dedicated-node/ed51b31e-f415-4f4e-8412-f362eb196d82 (0) (canary). Done (00:01:46) 
Started updating instance dedicated-node > dedicated-node/4ffcff38-520c-4d39-8e7c-12188455a150 (10) 
Started updating instance dedicated-node > dedicated-node/ca73e6cf-193d-47f3-a98b-b25fceb95013 (4) 
Started updating instance dedicated-node > dedicated-node/d17dc022-b22d-4b32-bf0c-0e75089f9aa5 (2) 
Started updating instance dedicated-node > dedicated-node/d1a16f1d-5e48-4f5b-81dc-11ee79d2c521 (6) 
Started updating instance dedicated-node > dedicated-node/dbd4d3a8-5d72-4f0a-ac51-f8b870a5c324 (8) 
Done updating instance dedicated-node > dedicated-node/ca73e6cf-193d-47f3-a98b-b25fceb95013 (4) (00:01:08) 
Done updating instance dedicated-node > dedicated-node/dbd4d3a8-5d72-4f0a-ac51-f8b870a5c324 (8) (00:01:07) 
Done updating instance dedicated-node > dedicated-node/d1a16f1d-5e48-4f5b-81dc-11ee79d2c521 (6) (00:01:07) 
Done updating instance dedicated-node > dedicated-node/d17dc022-b22d-4b32-bf0c-0e75089f9aa5 (2) (00:01:08) 
Done updating instance dedicated-node > dedicated-node/4ffcff38-520c-4d39-8e7c-12188455a150 (10) (00:01:43) 
Started updating instance dedicated-node > dedicated-node/20864ee1-d12d-4e9d-a886-30a89133d0ee (9) 
Started updating instance dedicated-node > dedicated-node/2a6e67ff-8eec-4c59-9219-b26a34160c10 (5) 
Started updating instance dedicated-node > dedicated-node/5cfc52f8-fc4a-4357-b766-4de05f6085f5 (1) 
Started updating instance dedicated-node > dedicated-node/82bbd88b-3454-4198-b395-050b8653c54a (3) 
Started updating instance dedicated-node > dedicated-node/d86c11a3-e5b8-4130-bfa8-be2145b2ee36 (7) 
Failed updating instance dedicated-node > dedicated-node/5cfc52f8-fc4a-4357-b766-4de05f6085f5 (1): Action Failed get_task: Task 952c8458-77af-43c3-71b2-9f33eb5e8281 result: Stopping Monitored Services: Stopping services '[service-metrics]' errored (00:00:40) 
Done updating instance dedicated-node > dedicated-node/d86c11a3-e5b8-4130-bfa8-be2145b2ee36 (7) (00:01:11) 
Done updating instance dedicated-node > dedicated-node/2a6e67ff-8eec-4c59-9219-b26a34160c10 (5) (00:01:22) 
Done updating instance dedicated-node > dedicated-node/82bbd88b-3454-4198-b395-050b8653c54a (3) (00:01:22) 
Done updating instance dedicated-node > dedicated-node/20864ee1-d12d-4e9d-a886-30a89133d0ee (9) (00:01:26) 
Failed updating instance dedicated-node (00:04:55)

Error 450001: Action Failed get_task: Task 952c8458-77af-43c3-71b2-9f33eb5e8281 result: Stopping Monitored Services: Stopping services '[service-metrics]' errored

Task 80792 error

For a more detailed error report, run bosh task 80792 --debug 
===== 2017-11-21 07:18:14 UTC Finished "bundle exec bosh -n deploy"; Duration: 438s; Exit Status: 1 
Exited with 1.

Cause

This error is caused because the default monit timeout is the 30s and service-metrics is set to allow up to 40s to stop the process. This causes monit to report a failure back when the process takes longer than the 30s.

Resolution

This issue is fixed in 'Redis for PCF' 1.9.6, 1.10.2, v1.11.0, and above and documented in Bug Fixes: https://docs.pivotal.io/svc-sdk/service-metrics/1-5/release-notes.html#bug-fixes

Service metrics now uses a drain script to prevent monit timeout issues [1.5.11].

Note: This issue was detected with Redis for PCF 1.9.4 and Metrics 1.5.11.

Comments

Powered by Zendesk