- Browser connections to the server time out or are rejected.
- Requests to the server are unexpectedly slow.
- The server is using more RAM than is available, leading to swapping.
- In the server error log, this message appears:
[error] server reached MaxClients setting, consider raising the MaxClients setting
- Degrade gracefully when more traffic shows up than the server can handle.
- Avoid wasting resources or becoming unstable with excessive creation/destruction of processes.
Basic limits: RAM and CPUThe most basic requirement is that the server needs to keep running and serving requests. Using too much RAM leads to swapping (further slowing down operations) if enabled by the OS, or out-of-memory crashes if not. Over-utilizing the CPU can slow down operations unacceptably. In either case, if these conditions persist for longer than short spikes, they can lead to a downward spiral as excess requests back up and existing requests are served more slowly. Eventually the server or even the OS may crash or become completely unresponsive.
Server limitsThe solution to this problem is to place limits on server operations. It would be nice if you could simply limit the amount of RAM or CPU your server uses, but there is no direct way to do this with the standard distribution. The basic method of limiting resource usage is to limit the number of connections the server handles concurrently. You can think of this as an emergency valve - connections in excess of the limit are queued (or eventually refused) until existing connections are closed. During times of high traffic, some requests may be delayed or refused, but that is usually preferable to complete collapse of the server. Of course, you also don't want to set limits too low and have your server start limiting requests while mostly idle.
Understanding resource usage in order to set connection limits requires understanding how concurrent connections are handled, which depends on the Multi-Processing Module (MPM) compiled into the server. Here we digress into a brief description of how common MPMs manage workers (either processes or threads, so we refer to them generically as "workers") for handling connections. A worker handles a connection from open to close.
Typically you should use a threaded MPM, unless you have specific reasons not to (e.g. non-thread-safe modules/libraries). If you are using an MPM not listed here (probably OS-specific) then consult the Apache documentation for that MPM. With every MPM, there is a single master (or "parent") process that is created first when the server starts, and one or more child process to handle connections. The directives discussed apply server-wide and should be placed in main server configuration. This is the original MPM, and is very simple and stable. There is one child process per worker - no threading. So, if MaxClients is X, there will be up to X processes (plus one master process). The master creates or destroys child processes according to traffic, up to the MaxClients limit. If MaxRequestsPerChild is set, once a process has handled that many connections, it exits. Used only on Windows, this MPM uses one threaded child process. Each worker is a single thread in the child process. The single child handles the whole server load. MaxClients is not used. Worker threads are created at child creation according to ThreadsPerChild (also limited by ThreadLimit). If MaxRequestsPerChild is set, once the child has received that many connections, a new child is spawned and the old child exits (after completing its existing connections).
This is a hybrid threaded MPM, using multiple threads under multiple child processes. Each child process has a static number of worker threads specified with ThreadsPerChild. Children are created and destroyed according to traffic, up to the lower limit imposed by either ServerLimit (max number of children) or MaxClients (max total number of worker threads, so number of children times ThreadsPerChild). If MaxRequestsPerChild is set, once a child process has received that many connections, it receives no new connections and exits after completing its existing connections.
Obviously, the choice of MPM determines how processes are created, and thus how limits are set. Typically the operating system only displays RAM and CPU usage per process, but processes may vary in size and load depending on ThreadsPerChild and ThreadStackSize (if threaded) and work load.
Note: Remember that actual results depend heavily on traffic, configuration, and OS parameters; you must load-test your server under conditions similar to real-world traffic to determine real performance characteristics.
To determine a rough approximation for server RAM usage , multiply the number of child processes by the amount of resident RAM a process uses (tools like "top" report multiple forms of RAM use - resident is typically the relevant one). Restrict the number of processes (or their thread resources) such that available RAM is never exceeded. Traffic beyond the limit is queued or eventually refused. CPU percentage usage can similarly be approximated and limited.
Example: Suppose using worker you have set ThreadsPerChild 25 and MaxClients 50, and the two child processes each use 128MB of RAM and 5% of CPU under full load. You can roughly expect 8 children (MaxClients 200 = 25 x 8) to use 1GB of RAM and 40% of CPU under full load.
Under Windows, there is only one child process, so as a rough approximation, assume that RAM and CPU usage is evenly divided among the worker threads; restrict the number of worker threads such that available RAM and CPU will never be exceeded.
Example: Suppose using mpm_winnt you have set ThreadsPerChild 500, and the child process uses 1GB of resident RAM and 5% of CPU under full load. You can roughly expect that increasing ThreadsPerChild to 1000 would increase RAM usage to 2GB and CPU usage to 10% if fully loaded.
Of course, the only way to determine how your system actually performs is to put it into actual usage, or at least test with a load representative of actual usage. Type of usage determines how much RAM and CPU a worker will use, for how long. For example, a server that mainly serves static local content uses resources depending on the size of that content. A server with many heavy-CPU or long-lasting CGI requests needs more severe limits. A server that mainly serves as a proxy consumes little RAM or CPU and in general can support many more workers. Under real world usage, where you may have a variety of requests, a reasonable approach is to start with conservative limits and expand them as needed as long as host resources are not exhausted under peak traffic.
Note: You can change most of these parameters during a graceful restart so that no interruption of service is necessary to test new settings or scale them back. However, ServerLimit and ThreadLimit can not be modified during a graceful restart.
Note: The amount of free RAM reported by the OS may be less than what is actually available, as the OS may use available RAM for caching purposes and can release it as needed. You do not necessarily need to be concerned if free RAM is low. You should be concerned if the host is swapping, and if the resident memory used by all processes sums to more RAM than the host has.
Be cautious of configurations that result in lots of child process creation and destruction, as this overhead can become significant for a high-traffic site. Consider two common scenarios:
1. Low MaxRequestsPerChild
This setting forces child processes to exit after a number of requests. Ideally this should be set to 0, meaning that a process could persist forever. But with some use cases, processes may become unstable over long usage, and it is easier to just restart processes than to find and fix the source of instability. In that case you need to limit MaxRequestsPerChild, but it is critical not to set it too low, especially with mpm_winnt (which is designed specifically to avoid the overhead of process regeneration). Try a number like 1000000 (one million) to begin with and only lower it if the instability persists.
2. Reaping idle worker processes
Apache HTTP Server allows you to specify how many worker processes you want to start with, how many to grow to as a limit, and when to destroy them. Under highly variable traffic on a busy site, the defaults may lead to lots of processes being created, then quickly torn down again when idle, many times over in a short period. This sort of churn can be very detrimental to performance.
Fortunately, the solution is fairly simple: there's very little reason for an enterprise server to reclaim idle resources, so just configure the server to begin with the maximum number of workers and leave it there.
- With mpm_winnt this is the behavior regardless.
- If using prefork, set StartServers and MaxSpareServers equal to MaxClients.
- If using worker or event, set StartServers equal to ServerLimit and MaxSpareThreads equal to MaxClients .
Eliminate other drags on performance
Hopefully the discussion above has given you a basic understanding of how the server handles traffic and you can use this to determine fundamental settings. For common ways to maximize performance, you may want to read and understand the Apache Performance Tuning page, particularly the discussion of AllowOverride, HostnameLookups/DNS, and removing modules. There is no simple trick to just speed up your server, but depending on usage you may find many optimizations.
Confidential or Internal Solution information
- Configuring the MPM used in an HTTPD instance of vFabric Enterprise Ready Server