Pivotal Knowledge Base

Follow

Trouble shooting load average issues

Applies to

GemFire 6 and later

Description

Load average shows how many processes are waiting in the run queue for a system resource (usually a processor). The higher the load average, the more processes are waiting.

Trouble shooting

One way to determine whether a machine has high load average is to use an operating system command such as uptime or top while the application is running.

uptime

The uptime output below shows that the load average is 0.40, 0.46, 0.43 over the last 1, 5 and 15 minutes, respectively.

$ uptime
15:37:27 up 107 days, 2:24, 32 users, load average: 0.40, 0.46, 0.43
Another way to determine whether a machine has high load average is to use either the gemfire stats command or vsd to display the load average values contained in a given GemFire statistics archive. The LinuxProcessStats category contains the load average statistics.

LinuxProcessStats

The gemfire stats command below shows the LinuxProcessStats loadAverage1 value in the stats.gfs archive.

$ gemfire stats :LinuxSystemStats.loadAverage1 -archive=stats.gfs
[info] Found 1 match for ":LinuxSystemStats.loadAverage1"
gfigridcachesw4p, 515600110, LinuxSystemStats: "2009/02/10 16:03:10.442 UTC" samples=2444
loadAverage1 threads: samples=2444 min=0.37 max=147.59 average=28.92 stddev=31.51

The gemfire stats command below shows the LinuxProcessStats loadAverage5 value in the stats.gfs archive.

$ gemfire stats :LinuxSystemStats.loadAverage5 -archive=stats.gfs
[info] Found 1 match for ":LinuxSystemStats.loadAverage5"
gfigridcachesw4p, 515600110, LinuxSystemStats: "2009/02/10 16:03:10.442 UTC" samples=2444
loadAverage5 threads: samples=2444 min=3.96 max=85.9 average=33.19 stddev=19.65

The gemfire stats command below shows the LinuxProcessStats loadAverage15 value in the stats.gfs archive.

$ gemfire stats :LinuxSystemStats.loadAverage15 -archive=stats.gfs
[info] Found 1 match for ":LinuxSystemStats.loadAverage15"
gfigridcachesw4p, 515600110, LinuxSystemStats: "2009/02/10 16:03:10.442 UTC" samples=2444
loadAverage15 threads: samples=2444 min=17.32 max=56.13 average=36.56 stddev=9.53

The vsd Tool

In VSD the load averages can be found in LinuxSystemStats loadAverage1, loadAverage5 and loadAverage15 metrics

Solution

Determining that there is high load is one thing. Finding the source of the load is another (whether it be CPU or I/O). One operating system command that can help determine the cause of high load is top.

The top output shows, among other things, load average, CPU usage percentages and I/O wait (iowait) percentage. The iowait percentage is the percentage of time the CPU is waiting for an I/O to complete. The output below shows a fairly high load average over the past 1 minute (10.40) for the number of CPUs. It also shows that the CPUs are mostly in use (idle=3.0%) and that I/O wait percentage is low (0.4%). In this case, the load is due to CPU.

12:49:24 up 113 days, 23:36, 35 users, load average: 10.40, 5.20, 2.30
615 processes: 587 sleeping, 27 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 61.7% 0.0% 31.4% 0.5% 2.5% 0.4% 3.0%

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
22523 user1 15 0 1102M 1.1G 18068 R 3.6 14.1 0:24 1 java
22778 user1 15 0 1102M 1.1G 18068 R 2.1 14.1 0:02 1 java
22682 user1 15 0 1102M 1.1G 18068 R 1.4 14.1 0:07 1 java
22698 user1 15 0 1102M 1.1G 18068 R 1.4 14.1 0:10 0 java
19286 user1 15 0 1100M 1.1G 18080 R 0.5 14.1 0:25 0 java

In this case, the CPU is clearly causing the high load. 

If, instead, the I/O wait percentage was high, then the high load might be related to disk I/O. 

 

Comments

Powered by Zendesk