Pivotal Knowledge Base

Follow

Linux Zombie or <defunct> processes consuming high CPU capacity in Elastic Runtime environment

Environment 

Product Version
Elastic Runtime 1.6.x to 1.6.15

Symptom

Linux Zombie or <defunct> processes running on DEA and/or Diego are consuming high CPU capacity as shown below:

 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
...
    1 18956 18956 18956 ?           -1 S<s      0   0:00 wshd: 199u6qa64nq    
18956 19906 19857 19857 ?           -1 Z<l  20090 12786:18  \_ [java] <defunct>
...
    1 27915 27915 27915 ?           -1 S<s      0   0:00 [wshd]
27915 28279 28259 28259 ?           -1 Z<l  20079 22291:50  \_ [java] <defunct>
...

Please note that it does not seem possible to kill the zombie processes directly.

Cause

A bug with AUFS [https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043] was introduced in version 3.19.0-40 of the Linux kernel. This bug can cause containers to end up with unkillable zombie processes with high CPU usage. This can happen any time a container is supposed to be destroyed.

Resolution

Upgrading your Elastic Runtime version to 1.6.15 or greater is the only fix for this issue. You can see more information about this in the release notes:

http://docs.pivotal.io/pivotalcf/pcf-release-notes/p1-v1.6/runtime_rn_1_6.html

Additional Information

The Metron agent job in version 1.6.x leaves behind a pair of harmless zombie processes, so those zombies could be a false positive. These zombie processes should be removed in the CF release PCF 1.7.

root 4549 0.0 0.0 0 0 ? Z< Mar19 0:00 [metron_agent_ct] <defunct>
root 4550 0.0 0.0 0 0 ? Z< Mar19 0:00 [metron_agent_ct] <defunct>

 

 

Comments

Powered by Zendesk