Pivotal Knowledge Base

Follow

Pivotal Cloud Foundry® (PCF) Consul: cf Apps Displaying Instances Count As ?/X

Environment 

 Product  Version
 Pivotal Cloud Foundry®  (PCF)  1.7, 1.8

Purpose

This article describes what to do in case the "Instances" details are not displaying correctly when running `cf apps`. You will see the instances count being displayed as a "?" as shown in the screenshot below.

Cause

This can be caused by an issue with Consul. Please check your consul_agents are running as expected.

You can check this by running the following command:

bosh instances --ps

Check that all the consul_agent services are running in the output using the command above.

Example:

|   consul_agent                                                                                         | running |

Resolution

If any of the VM's are displaying the consul service as stopped or failing, please run the following:

bosh ssh (to the affected VM)
sudo monit restart consul_agent

Once the agent has started, please run the following command to list servers that are talking to each other via consul:

/var/vcap/packages/consul/bin/consul members

Example: You should see something like the following: 

Node |Address |Status |Type |Build |Protocol |DC
cloud-controller-partition-a1684d9aebac3fc7726f-0 |192.168.27.41:8301 |alive |client |0.6.4 |2 |dc1
cloud-controller-worker-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.43:8301 |alive |client |0.6.4 |2 |dc1
consul-server-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.33:8301 |alive |server |0.6.4 |2 |dc1
diego-brain-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.45:8301 |alive |client |0.6.4 |2 |dc1
diego-cell-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.46:8301 |alive |client |0.6.4 |2 |dc1
diego-cell-partition-a1684d9aebac3fc7726f-1 |192.168.27.41.51:8301 |alive |client |0.6.4 |2 |dc1
diego-database-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.36:8301 |alive |client |0.6.4 |2 |dc1
ha-proxy-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.32:8301 |alive |client |0.6.4 |2 |dc1
uaa-partition-a1684d9aebac3fc7726f-0 |192.168.27.41.44:8301 |alive |client |0.6.4 |2 |dc1

Checking Ports

If any of the members are not alive, check the network connectivity to the instance using Netcat.

Example:

nc -v 192.168.27.41 8301

You should see something like the following:

Connection to 192.168.27.41 8301 port [tcp/*] succeeded!

Checking Logs

To check that the members of the cluster are in sync, check the consul_agent.stdout.log by running the following:

tail -f /var/vcap/sys/log/consul_agent/consul_agent.stdout.log
2016/08/02 05:28:52 [DEBUG] agent: Service 'consul' in sync
2016/08/02 05:29:15 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.75:8301
2016/08/02 05:29:45 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.80:8301
2016/08/02 05:30:11 [DEBUG] agent: Service 'consul' in sync
2016/08/02 05:30:15 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.82:8301
2016/08/02 05:30:25 [DEBUG] memberlist: TCP connection from: 192.168.17.80:41909
2016/08/02 05:30:45 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.82:8301
2016/08/02 05:31:10 [DEBUG] memberlist: TCP connection from: 192.168.17.83:52494
2016/08/02 05:31:15 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.83:8301
2016/08/02 05:31:33 [DEBUG] memberlist: TCP connection from: 192.168.17.91:34780
2016/08/02 05:31:41 [DEBUG] memberlist: TCP connection from: 192.168.17.84:55011
2016/08/02 05:31:45 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.17.91:8301
2016/08/02 05:31:58 [DEBUG] agent: Service 'consul' in sync


Recovering the Consul service 
Consul is used for discovering and configuring services in the infrastructure. If it gets corrupted, it's safe to clear the agent logs and start the service again. The consul will pick up any new services once it starts running again.

If your consul servers are not coming back in sync by restarting the services as above, carry out the following actions:

  1. bosh ssh (Into failed consul server node)
  2. sudo -i (switch to root user)
  3. monit stop consul_agent (on all server nodes in consul cluster before moving to step 4)
  4. rm -rf /var/vcap/store/consul_agent/* (on all server nodes in consul cluster before moving to step 5)
  5. monit start consul_agent (one-by-one on each server node in consul cluster)
  6. repeat step 5 one-by-one on each server node in consul cluster.
  7. /var/vcap/packages/consul/bin/consul info (to check that all the members have joined, e.g., members = 9, for more information on consul commands please click here [1]) 

Additional Information

For additional information on consul, commands please click here [1]

1. https://www.consul.io/docs/commands/index.html

Please refer to the following article on enabling debugging in consul [2]

2. https://discuss.pivotal.io/hc/en-us/articles/224187887-How-to-enable-debug-mode-for-Consul 

 

 

Comments

Powered by Zendesk