Pivotal Knowledge Base

Follow

Unable to start rabbitmq_server node after upgrading RabbitMQ for Pivotal Cloud Foundry

Environment 

 Product  Version
RabbitMQ for Pivotal Cloud Foundry  1.8.0, 1.8.1, 1.8.2

Problem

After upgrading RabbitMQ for Pivotal Cloud Foundry (PCF) to version 1.8.0, 1.8.1 or 1.8.2, rabbitmq_server node is unable to start.

For PCF RabbitMQ 1.8.4, please review the documentation to set the Disk free alarm limit.

Symptom

Unable to start RabbitMQ server node. Looking at the logs on one of the RabbitMQ nodes under

/var/vcap/sys/log/rabbitmq-server/startup_stderr

BOOT FAILED 
===========

Error description:
{could_not_start,rabbit,
{error,
{{shutdown,
{failed_to_start_child,rabbit_memory_monitor,
{badarg,
[{lists,member,[disk,{error,bad_module}],[]},
{rabbit_memory_monitor,init,1,
[{file,"src/rabbit_memory_monitor.erl"},
{line,121}]},
{gen_server2,init_it,6,
[{file,"src/gen_server2.erl"},{line,554}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]}}},
{child,undefined,rabbit_memory_monitor_sup,
{rabbit_restartable_sup,start_link,
[rabbit_memory_monitor_sup,
{rabbit_memory_monitor,start_link,[]},
false]},
transient,infinity,supervisor,
[rabbit_restartable_sup]}}}}

Log files (may contain more information):
/var/vcap/sys/log/rabbitmq-server/rabbit@3931499f3971afdb69673e4d570b564e.log
/var/vcap/sys/log/rabbitmq-server/rabbit@3931499f3971afdb69673e4d570b564e-sasl.log

Analysis

This issue is caused by insufficient free disk space and is controlled by the disk_free_limit parameter.

In RabbitMQ for PCF 1.7 and below, this was configured to be the default which is 50MB:

{disk_free_limit, "50MB"}

This is dangerous because it is not a safe limit for free disk space and there are many performance implications described in the Disk Alarms documentation.

In RabbitMQ for PCF 1.8 this has been modified to be 40% of the system memory:

{disk_free_limit, {mem_relative, 0.4}}

If the memory on the RabbitMQ server nodes has been increased without increasing the disk space, it might result in this situation if the disk is less than 40% of the node memory.

See the Disk Space section on the Production Checklist for more information.

Resolution

Make sure the disk and memory allocations are set as per the requirements of the latest RabbitMQ version.

If memory is increased for the RabbitMQ server node, we need to increase the disk space in the right proportion.
The persistent disk size has to be > than 40% of the memory size.

The below table shows some examples for current disk_free_limit affecting the deployments with different disk and memory allocations:

Memory Size

Persistent Disk Size

disk_free_limit

Result

16GB (default)

10GB (default)

{mem_relative, 0.4}

6.4GB needed

32GB

10GB (default)

{mem_relative, 0.4}

12.8GB needed

16GB (default)

4GB

{mem_relative, 0.4}

6.4GB needed

16GB (default)

8GB

{mem_relative, 0.4}

6.4GB needed 

The rows in Green are allowed as per the disk requirement of 40% of memory, whilst the rows in Red are not allowed.

Please review the updated documentation for RabbitMQ for PCF 1.8.4 for a better way to set the Disk free alarm limit.

Comments

Powered by Zendesk