Pivotal Knowledge Base

Follow

How To Forward VM Status Information to an External Reliable Event Logging Protocol (RELP) Server

Environment

 Product  Version
 Pivotal Cloud Foundry  1.7

Purpose

We can make a configuration to forward logging messages to an external Reliable Event Logging Protocol (RELP) server. While there is no specific error message to indicate when a VM goes down, it is possible to monitor the health status of the VMs by checking for certain error messages on the external log server. In this article, I will show an example of how this can be done. 

Resolution

  • Generate a bosh.yml. This can be done by obtaining a copy of section "Generated BOSH Product Manifest" in https://ops_manager_url/debug/files. Then replace the marked password with the real one which can be found in Ops Manager Director --> Credentials.
  • Add the following lines in bosh.yml and place these lines below section "hm:". You might need to adjust the address and port which match your environment.

Here is an example in my environment. The newly added lines are the blue ones:

    hm:
      director_account:
        ca_cert: |
          -----BEGIN CERTIFICATE-----
          MIIC+zCCAeOgAwIBAgIBADANBgkqhkiG9w0BAQUFADAfMQswCQYDVQQGEwJVUzEQ
          MA4GA1UECgwHUGl2b3RhbDAeFw0xNjA4MjgwODQ1NTBaFw0yMDA4MjkwODA1NTBa
          MB8xCzAJBgNVBAYTAlVTMRAwDgYDVQQKDAdQaXZvdGFsMIIBIjANBgkqhkiG9w0B
          AQEFAAOCAQ8AMIIBCgKCAQEA0CvGtic+/lurrUIPq2J2eDWP6tJBVY2bcqajRgMm
          I8+mnQGeKMLa7uFxT4u1idEZkDHZGd5/FqkboiO3TatEk+rFun6mAvYkY9a4vEP6
          PP9wNhMjOc04F/JRqpwGHCh4/we9ZrmUkdYhLO4UYCcnJwXzJNF+ygPHFBH/bcp/
          ZLBUZgnWLpST7o98ulRfKXPuTmA5S84TvM5deFAJ5ZkPYFECAHfVoHPI+INTRtO4
          Yam29ReJZYYccWOCsJSO9tzNkDk5SYqAKOqFthD4sZ6aQMYtJ4qgZ3B3zI8HlpGz
          RMO/BX0gxBCBxinYtAYcYQjUSUO1B+/Ck3V5YmH/XsRu5wIDAQABo0IwQDAdBgNV
          HQ4EFgQUdxZ6UG3adTAvEQ2URec2xqVfzYMwDwYDVR0TAQH/BAUwAwEB/zAOBgNV
          HQ8BAf8EBAMCAQYwDQYJKoZIhvcNAQEFBQADggEBABB48+Q0PorKvFgcZd9EziPU
          sLxRC6uTXT+xiIz1DuORCiy4VlLk+KaZ02LHIihY7S7KEWzGZF2cgyi/RUzEX5jm
          Vv7g39N6SwHava760c0FAAGs1pWuPFsSojrr/Vt4FiaZKy/0zAzJbrGVI6ZyqKpQ
          7OGl9KTywLroKxTCx3sxRmrubEqN7Y7ElvPSYp4x0CIW7BVC5KZQ3+cEV7qDo5r5
          x7TLaz++sV5w4ujMOOKeFhsrkwRulWp24ct9pvltUJH3mJRSPE1Gf1+KcVI2vjl+
          R10uzZW6TaXzdNSaDXvNXUy/j9tZgT5q6SKeXHJWIbFndMH1o5/RJYV5dCHaazA=
          -----END CERTIFICATE-----
        client_id: health_monitor
        client_secret: a3ce1ad407a6add7d735
      resurrector_enabled: false
      pagerduty_enabled: false
      pagerduty:
        service_key: 
        http_proxy: 
      email_notifications: false
      email_recipients: []
      smtp:
        from: 
        host: 
        port: 25
        domain: 
        tls: false
        user: 
        password: 
      tsdb_enabled: true
      tsdb:
        address: 192.168.6.115
        port: 13321
      syslog_event_forwarder_enabled: true
      syslog_event_forwarder:
        address: 192.168.6.6
        port: 514
  • Find a maintenance window, log in to the ops manager VM and run the following command. This command will redeploy a new director with this new configuration. The director will go down momentarily, and when it comes back up, it'll have the new configuration.
sudo -u tempest-web bosh-init deploy /var/tempest/workspaces/default/deployments/bosh.yml

When the VM goes down, you can monitor the VM's health status by filer keyword "process is not running" or "timed out". The whole messages look like the following ones:

Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0]  [ALERT] 
{"kind":"alert","id":"1476930030.1431198801@localhost","severity":1,
"title":"metron_agent (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476929728} Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"1476930030.61271731@localhost","severity":1,
"title":"garden (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476930034} Oct 20 02:20:30 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"1476930030.524812971@localhost","severity":1,
"title":"consul_agent (192.168.6.82) - Does not exist - restart",
"summary":"process is not running","source":"agent b01fd340-0c92-4b43-b342-5e9f6ca2f02a []",
"created_at":1476930032} Oct 20 02:26:59 192.168.6.67 bosh.hm [job=health_monitor index=0] [ALERT]
{"kind":"alert","id":"efcdd40e-0990-473a-877e-5422ea38b709","severity":2,
"title":"b01fd340-0c92-4b43-b342-5e9f6ca2f02a has timed out",
"summary":"b01fd340-0c92-4b43-b342-5e9f6ca2f02a has timed out",
"source":"cf-354c7a968f30d3a40537: diego_cell-partition-75437f0a091c320d8ca4(cd9b0e34-64f5-4654-9b90-dd9fb54863fb)
[id=b01fd340-0c92-4b43-b342-5e9f6ca2f02a, index=1, cid=vm-3396c002-2064-462a-91b2-5341f655a015]","created_at":1476930419}

Impact / Risks

The bosh.yml file you make needs to be kept up-to-date. Any changes in Ops Manager's Director tile would mean you'd need to rebuild bosh.yml. More importantly, any time you Apply Changes in Ops Manager or Upgrade Ops Manager (Anything that would redeploy the director) you're going to need to repeat this process. 

Be very careful with the bosh.yml file. This file contains highly sensitive information about your environments such as IP addresses, private keys and usernames (passwords are masked out). Be sure to delete the file or carefully protect it when you're done inspecting it.

Comments

Powered by Zendesk