Pivotal Knowledge Base

Follow

How to Forward VM Status Information to an External Reliable Event Logging Protocol (RELP) Server

Environment

Pivotal Cloud Foundry Version 1.7

Purpose

We can make a configuration to forward logging messages to an external Reliable Event Logging Protocol (RELP) server. While there is no specific error message to indicate when a Virtual Machine (VM) goes down, it is possible to monitor the health status of the VMs by checking for certain error messages on the external log server. In this article, an example of how this can be done is shown. 

Procedure

  • Generate a bosh.yml. This can be done by obtaining a copy of section "Generated BOSH Product Manifest" in https://ops_manager_url/debug/files. Then replace the marked password with the real one which can be found in Ops Manager Director --> Credentials.
  • Add the following lines in bosh.yml and place these lines below section "hm:". You might need to adjust the address and port which match your environment.

Here is an example of my environment. The newly added lines are marked in blue:

    hm:
      director_account:
        ca_cert: |
          -----BEGIN CERTIFICATE-----
          MIIC+zCCAeOgAwIBAgIBADANBgkqhkiG9w0BAQUFADAfMQswCQYDVQQGEwJVUzEQ
          MA4GA1UECgwHUGl2b3RhbDAeFw0xNjA4MjgwODQ1NTBaFw0yMDA4MjkwODA1NTBa
          MB8xCzAJBgNVBAYTAlVTMRAwDgYDVQQKDAdQaXZvdGFsMIIBIjANBgkqhkiG9w0B
          AQEFAAOCAQ8AMIIBCgKCAQEA0CvGtic+/lurrUIPq2J2eDWP6tJBVY2bcqajRgMm
          I8+mnQGeKMLa7uFxT4u1idEZkDHZGd5/FqkboiO3TatEk+rFun6mAvYkY9a4vEP6
          PP9wNhMjOc04F/JRqpwGHCh4/we9ZrmUkdYhLO4UYCcnJwXzJNF+ygPHFBH/bcp/
          ZLBUZgnWLpST7o98ulRfKXPuTmA5S84TvM5deFAJ5ZkPYFECAHfVoHPI+INTRtO4
          Yam29ReJZYYccWOCsJSO9tzNkDk5SYqAKOqFthD4sZ6aQMYtJ4qgZ3B3zI8HlpGz
          RMO/BX0gxBCBxinYtAYcYQjUSUO1B+/Ck3V5YmH/XsRu5wIDAQABo0IwQDAdBgNV
          HQ4EFgQUdxZ6UG3adTAvEQ2URec2xqVfzYMwDwYDVR0TAQH/BAUwAwEB/zAOBgNV
          HQ8BAf8EBAMCAQYwDQYJKoZIhvcNAQEFBQADggEBABB48+Q0PorKvFgcZd9EziPU
          sLxRC6uTXT+xiIz1DuORCiy4VlLk+KaZ02LHIihY7S7KEWzGZF2cgyi/RUzEX5jm
          Vv7g39N6SwHava760c0FAAGs1pWuPFsSojrr/Vt4FiaZKy/0zAzJbrGVI6ZyqKpQ
          7OGl9KTywLroKxTCx3sxRmrubEqN7Y7ElvPSYp4x0CIW7BVC5KZQ3+cEV7qDo5r5
          x7TLaz++sV5w4ujMOOKeFhsrkwRulWp24ct9pvltUJH3mJRSPE1Gf1+KcVI2vjl+
          R10uzZW6TaXzdNSaDXvNXUy/j9tZgT5q6SKeXHJWIbFndMH1o5/RJYV5dCHaazA=
          -----END CERTIFICATE-----
        client_id: health_monitor
        client_secret: a3ce1ad407a6add7d735
      resurrector_enabled: false
      pagerduty_enabled: false
      pagerduty:
        service_key: 
        http_proxy: 
      email_notifications: false
      email_recipients: []
      smtp:
        from: 
        host: 
        port: 25
        domain: 
        tls: false
        user: 
        password: 
      tsdb_enabled: true
      tsdb:
        address: 192.168.6.115
        port: 13321
      syslog_event_forwarder_enabled: true
      syslog_event_forwarder:
        address: 192.168.6.6
        port: 514
  • Find a maintenance window, log in to the ops manager VM and run the following command:
sudo -u tempest-web bosh-init deploy /var/tempest/workspaces/default/deployments/bosh.yml

When the VM goes down, you can monitor the VM's health status by filer keyword "process is not running" or "timed out". The whole messages look like the following ones:

Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0]  [ALERT] 
{"kind":"alert","id":"1476930030.1431198801@localhost","severity":1,
"title":"metron_agent (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476929728} Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"1476930030.61271731@localhost","severity":1,
"title":"garden (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476930034} Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"1476930030.524812971@localhost","severity":1,
"title":"consul_agent (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476930032} Oct 20 02:26:59 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"efcdd40e-0990-473a-877e-5422ea38b709","severity":2,
"title":"b01fd340-0c92-4b43-b342-5e9f6ca2f02a has timed out",
"summary":"b01fd340-0c92-4b43-b342-5e9f6ca2f02a has timed out",
"source":"cf-354c7a968f30d3a40537: diego_cell-partition-75437f0a091c320d8ca4(cd9b0e34-64f5-4654-9b90-dd9fb54863fb)
[id=b01fd340-0c92-4b43-b342-5e9f6ca2f02a, index=1, cid=vm-3396c002-2064-462a-91b2-5341f655a015]","created_at":1476930419}

This command will redeploy a new director with this new configuration. The director will go down momentarily but, when it comes to back up, it will have the new configuration.

Impact

The bosh.yml file needs to be kept up-to-date. Any changes in Ops Manager's Director tile would mean you would need to rebuild bosh.yml. Any time you apply changes in Ops Manager or upgrade Ops Manager (anything that would redeploy the director), you need to repeat this process. 

Be very careful with the bosh.yml file. This file contains highly sensitive information about your environments such as IP addresses, private keys and usernames (passwords are masked out). Make sure that you delete the file or carefully protect it when you are done inspecting it.

 

Comments

Powered by Zendesk