Pivotal Knowledge Base

Follow

Consul failing "x/y nodes reported success"

Environment

 Product  Version
 Pivotal Cloud Foundry®   1.10 and older

Symptom

In some cases, Consul will not reach a quorum and it will fail with the following message:

2017/05/04 12:31:39 [INFO] serf: EventMemberFailed: dedicated-node-8 192.0.2.26 {"timestamp":"1493901099.471777678","source":"confab","message":"confab.agent-client.set-keys.list-keys.request.failed","log_level":2,"data":{"error":"155/179 nodes reported success"}}

The important part to note is at the end, where 155 and 179 can vary depending on the size of your environment.

{"error":"155/179 nodes reported success"}

This will also result in Consul/0 instance being reported as down or failing.  This is the canary instance and its failure will halt the deployment so that the other Consul nodes remain up and working. 

Cause

This error is indicating that Consul cannot communicate with some subset of the Consul agents.  This has typically happened when Consul is propagating new keys or rotating the existing keys that it uses to communicate securely and there is a problem during this process.  One common problem is that port 8301 is blocked and prevents Consul from distributing the keys.

Resolution

This is not typically a problem with the Consul server, but with the specific Consul Agents. To resolve this issue, you need to resolve the issue with the Consul Agent. Exactly how you can resolve this depends on the problem affecting the agent.

Pivotal recommends that you check the problem Consul Agent VMs for the following:

  • Make sure that there is adequate disk space on the node. If ephemeral or persistent disks are more than 90% full, increase their size or delete something to free up space.
  • Confirm that Consul Server is able to communicate on port 8301 with the VMs in question (both TCP & UDP).

In some cases, these steps might not be sufficient to resolve the issue.  In that case, please contact Pivotal Support for additional assistance troubleshooting and resolving the problem.

Impact/Risks

This situation is not the same as when Consul goes split brain. In fact, following the instructions to repair a split-brain Consul will make this situation worse and can cause application downtime.  If you are seeing the error message indicated in the Symptoms section above, make sure you resolve this before restarting or recreating any of the Consul server nodes.

Additional Information

Related article: 

Consul fails to start during upgrade in Cloud Foundry

How to enable debug mode for Consul

Comments

  • Avatar
    Jim Worrell

    * As an addendum:

    It is a good idea to check processes across all vms by using `bosh vms --details` to determine if there is evidence of high disk usage.

    Also it is recommended to check all deployments and not just the CF deployment.

Powered by Zendesk