- Pivotal Greenplum Database (GPDB) 4.3.x
- Operating System- Red Hat Enterprise Linux 6.x, CentOS 6, 7
- PHD 2.x, 3.x
Users see the following Java Heap Space error on a system with enough free memory. This error is not limited to the PHD or GPDB but can be observed when running any Java application that has similar heap space requirements.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Error occurred during initialization of VM Could not reserve enough space for object heap
INFO: os::commit_memory(0x0000000718000000, 2818572288, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 2818572288 bytes for committing reserved memory.
The direct reason of this symptom is that there is not enough virtual memory space requested by an application thus the application fails to start. This usually happens during the application startup because applications reserve a large amount of virtual memory for the JVM heap space during the startup phase. The real problem, however, here was that the system had enough physical memory installed which could accommodate the application's memory requirement but still observes the error.
Most systems with GPDB or PHD installed has Linux kernel parameter of vm.overcommit_memory = 2 & vm.overcommit_ratio = 50 because they are recommended settings as per the documentation. With these params the available Virtual memory space will be calculated by (SwapTotal + MemTotal * vm.overcommit_ratio / 100). With this setting, unexpected problem can arise when swap size is very small.
For example, lets say a node has 256GB of memory installed and it only has 4GB of swap space, in this case the OS will only allow up to 132GB of virtual memory, which effectively wasting almost 50% of virtual memory space. While this 50% of space not mapped to virtual memory can be used for buffer and file cache, that is not the ideal way of making use of the memory.
How to detect similar situations -
1. Check physical memory & swap space configured - you see the large disparity between the MemTotal & SwapTotal.
# cat /proc/meminfo | grep Mem
MemTotal: 264418044 kB
MemFree: 124329352 kB
# cat /proc/meminfo | grep Swap
SwapTotal: 4194300 kB
SwapFree: 4194300 kB
2. Check current values of virtual memory related kernel parameters. Specifically below two parameters are in our interest.
# sysctl -a | grep overcommit
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
3. Check current virtual memory overcommit status
# cat /proc/meminfo | grep Commit
CommitLimit: 215728732 kB # SwapTotal(4194300) + MemTotal(264418044) * vm.overcommit_ratio(80) / 100
Committed_AS: 207266812 kB # This is the real amount of memory actually being used by applications
From the output above, we can notice the CommitLimit which is actually virtual memory limit is far less than the physical memory installed (MemTotal: 264418044 kB), and Committed_AS is reaching the limit which could cause the OOM situation at any time.
In this specific case where swap size is very small compared to the physical memory size, vm.overcommit_ratio was recalculated using below formula to appropriately make use of the physical memory installed.
vm.overcommit_ratio = (MemTotal - SwapTotal) / MemTotal * 100
Once you have calculated the new vm.overcommit_ratio, update the /etc/sysctl.conf and run 'sysctl -p' as 'root'.
After this run 'cat /proc/meminfo | grep CommitLimit ' and check whether the limit has been lifted up.
For better understanding about Linux memory management, please check below articles.
There can be situations where you still experience the same issue even after applying this approach. In that case please check things additionally.
- When you are trying to run too many applications that current memory configuration cannot accommodate: Expand the physical memory(RAM) or relocate some of the applications (services) to other nodes.
- Have you given too much memory to some applications by mistake: Check memory configuration of your applications and make sure there is no mistake involved.