Pivotal Knowledge Base

Follow

Deployment of MySQL Monitor Replication Canary fails to reach MySQL Proxy

Environment

Pivotal Elastic Runtime: 1.9, 1.10 Pivotal MySQL: 1.9

Pivotal MySQL: 1.9

Symptom

MySQL monitor fails to update and complains the replication canary job failed to come up

Error 400007: 'mysql_monitor/0 (6363eb9b-8351-4168-88fe-0c50ca7c3872)' is not running after update. Review logs for failed jobs: replication-canary

Replication canary logs show it could not reach MySQL proxy/0

{"timestamp":"1491532826.912001133","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.Making request to proxy","log_level":0,"data":{"method":"GET","url":{"Scheme":"https","Opaque":"","User":null,"Host":"proxy-0-p-mysql-ert.cfhdctest.kroger.com","Path":"/v0/backends","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""}}} 
{"timestamp":"1491532826.932095051","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.received bad status code from proxy","log_level":0,"data":{"statusCode":502}} 
{"timestamp":"1491532826.932335138","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.Canary setup failed","log_level":3,"data":{"error":"bad response (502) - 502 Bad Gateway: Registered endpoint failed to handle the request.\n","trace":"goroutine 1 [running]:\ngithub.com/pivotal-cf-experimental/replication-canary/vendor/code.cloudfoundry.org/lager.(*logger).Fatal(0xc4200522a0, 0x7333e4, 0x13, 0x880c20, 0xc42025b9a0, 0x0, 0x0, 0x0)\n\t/var/vcap/packages/replication-canary/src/github.com/pivotal-cf-experimental/replication-canary/vendor/code.cloudfoundry.org/lager/logger.go:131 +0xc7\nmain.main()\n\t/var/vcap/packages/replication-canary/src/github.com/pivotal-cf-experimental/replication-canary/main.go:149 +0x1250\n"}} 
panic: bad response (502) - 502 Bad Gateway: Registered endpoint failed to handle the request

MySQL proxy 0 is stuck waiting for a lock

{switchboard.lock.acquiring-lock","log_level":1,"data":{"key":"v1/locks/mysql_lock","session":"1","value":""}

Applications using MySQL might see connection error 111 (Connection Refused)

Error: Can't connect to MySQL server on '192.168.20.111' (111)

Cause

MySQL and the proxies are working as intended. There are multiple proxies, but only one is ever active at a time, and they switch who is the active one using a consul lock.  

Currently replication canary will only try to reach MySQL proxy/0 regardless of how many proxy instances are deployed. If proxy/0 is down or unreachable then MySQL monitor job will fail with the above symptoms.

In this case, Bosh is reporting the proxy instances are running even though proxy/0 is not listening on port 8080. The reason is there is a known problem with having 3 or more proxy instances deployed. They will all compete for a lock through the consul which can result in a deadlock.

  • proxy/0 = waiting for lock
  • proxy/1 = lock acquired
  • proxy/2 = waiting for lock 

Resolution

The following will be added in a future release

  • Replication Canary job will scan all available proxy instances and choose the first one that is accessible instead of only using proxy 0
  • Users should only deploy a maximum of 2 instances of MySQL proxy to avoid deadlock. The locking mechanism will be enhanced in future to support more than 2 instances

Fix 

The best option here is to deploy a load balancer in front of the proxies using the "MySQL Service Hostname" config option. If this option is not used, then the system should instead be configured only to use a single proxy instance.  See workaround for options on how to get out of this situation without a load balancer. 

Workaround

If configuring a load balancer is not an immediate option then a quick resolution would be to scale down to 1 proxy instance, or monit stop all proxy instances except for one effectively reducing the number of proxies to 1 instance. Once ssh'ed into the proxy server you can run the below command to stop all services. Then later choose to scale the instance count to 1 or configure a load balancer.

monit stop all 

Comments

Powered by Zendesk