Apache’s processes

comment No Comments Written by Robert on April 8, 2008 – 10:46 am

The standard version of Apache for Unix/Linux (Version 1.3.27) is a sophisticated version of the forking server ‘http daemon’ process. Apache uses a collection (pool) of ‘pre-forked’ processes to reduce the time delays and costs that are associated with the creation of new processes. There is a principal process (the ‘chief’) that monitors the port/socket combination where TCP/IP connection requests are received from clients. This ‘chief’ process never handles any HTTP requests from the clients; instead it distributes this work to subordinate processes (the ‘tribesmen’). Each Apache ‘tribesman’ acts as a serial server, dealing with one client at a time. When a tribesman process finishes with a client, it returns to the pool managed by the chief. As well as being responsible for the distribution of work, the chief process is also responsible adjusting the number of child (tribesmen) processes. If there are too few tribesmen, clients’ requests will be delayed; if there are too many tribesmen, system resources are ‘wasted’ (the computer may have other work it could do, and such work may be slowed if most of the main memory is allocated to Apache processes).

The Apache process group is started and stopped using scripts supplied as part of the package (the Windows version of Apache is installed with ‘start’ and ‘stop’ shortcuts in the Start menu). The first Apache process that is created becomes the chief; it reads the configuration files and forks a number of child processes. These child processes all immediately block at locks controlled by the chief. The chief process and its children share some memory (this is implementation-dependent: it may be a shared file rather than a shared memory segment). This shared memory ‘scoreboard’ structure holds data that the chief uses to monitor its tribesmen and the lock structures that the chief uses to control operations by tribesmen.

When the chief has created its initial pool of tribesmen, it starts to monitor its socket for the HTTP port (usually port 80), blocking until there is input at this socket.When a client attempts a TCP/IP connection, the socket is activated and the chief process resumes. The chief finds an idle tribesman, and changes the lock status for that tribesman allowing it to resume execution. The chief can then check on its tribe’s state. If there are too few idle tribesmen waiting for work, the chief can fork a few more processes; if there are too many idle processes, some can be terminated.

When its lock is released, a tribesman process does an ‘accept’ on the server socket; this gives it a data socket that can be used to read data sent by a client, and to write data back to that client. The tribesman then reads the HTTP ‘Get’ or ‘Post’ request submitted by the client. The tribesman process handles a request for a simple static page, or for a page with dynamic content that will be produced by an internal Apache module (‘server-side includes’, PHP script etc.). If a request is for a dynamically generated page that has to be produced by a CGI program, the tribesman will have to fork a new process that will run this CGI program. The tribesman will communicate with its CGI process via a ‘pipe’ (and also via environment variables set prior to the fork operation); data relating to the request are stored in environment variables or are written to the pipe. The response from the CGI program is read from this pipe; this response must start with at least the Content-Type HTTP header information. The tribesman process adds a complete HTTP header to this response, and then writes the response on the data socket that connects back to the client. If the client is using the HTTP/1.0 protocol, the tribesman closes its data socket immediately after writing the response; then it returns itself to the pool of idle processes (by updating the shared scoreboard structure and blocking itself at a lock controlled by the chief). If a request is made using HTTP/1.1, the tribesman will keep the connection open and do a blocking read operation on the data socket. If this attempted read operation is timed out, the process closes the socket and then rejoins the idle pool. If the client does submit another request via the open connection, this can be handled. The procedure can then be repeated for up to a set maximum number of times.

It is fairly common for large C/C++ programs to leak memory a little. Leaks occur when temporary structures, created in the heap, are forgotten and never get deleted. The memory footprint of a process grows slowly when running a leaky program. Apache servers can contain modules from many third-party suppliers, and problems had been observed that were due memory leaks (some operating systems have C libraries that contain leaks). Leaks can now be dealt with automatically. The tribesman processes can be configured so that they will ‘commit suicide’ after handling a specified number of client connections. The process simply removes its entries from the shared scoreboard and then exits. The chief process can create a fresh process to replace the one that terminated.

Bookmark or Share:
  • E-mail this story to a friend!
  • Technorati
  • StumbleUpon
  • Facebook
  • Google
  • del.icio.us
  • Digg
  • Slashdot
If you enjoyed the article, why not subscribe?

Browse Timeline

Post a Comment

About The Author: Robert

Robert, founder of Stylishdesign.com, has worked in the art and advertising industry since 2000. Along with his team of well experienced writers, he shares insight into the world of art, culture, and design.

Want to subscribe?

SEO blog and web design related issues. Subscribe in a reader Or, subscribe via email:
Enter your email address:  
Bluehost.com $6.95 Hosting     DreamTemplate - Web Templates