Apache—Server Software With Process-Based Architecture

You may or not think about it much, or even at all, but your choice of server software has an impact on the performance of your site. It isn’t that one application is always better than the others, but rather they each have their own strengths and weaknesses and depending on the particulars of your site, one application might work better for you than another.

About a month ago I revisited an ongoing series about website performance. I spent a week talking about hosting plans and another talking about server hardware. Over the next few weeks I want to talk about server software, particularly Apache, NGINX, IIS, and LiteSpeed. I’ll talk about Apache today and continue with the others starting next week.

Odds are the software running your website is Apache, which still runs on close to 50% of all the servers across the web. It’s been the most commonly used server software for as long as I can remember and it’s all I’ve ever used. It’s 50/50 that the same is true for you as well.

There are a lot of Server Software Options

Apache, while popular, has never been the only application you could use to run a server. Internet Information Services (IIS) from Microsoft has also been popular throughout much of the life of the internet. For years your choice probably came down to which combination of server side programming language and database you wanted to use. If those were .Net and MSSQL, you went with IIS, otherwise you chose Apache and worked with open source languages and databases.

NGINX and LiteSpeed are additional applications that have grown in popularity in recent years and both offer performance benefits Apache and IIS don’t.

It doesn’t stop there. Wikipedia lists 36 different options for server software, assuming I counted correctly. I’m not going to talk about, or even mention, any besides the four I’ve already called out, but know there are a lot more options.

To understand why you might choose one or the other let’s talk about each of the four in turn, specifically how they handle HTTP requests and connections. We’ll start with Apache.

How Apache Handles HTTP Requests and Connections

Apache was developed with a process-based architecture in that each request for a connection is handled by a single process. The way it generally works is a parent process for the server receives connection requests and when it does, it creates (spawns) a child process to handle it. When another request comes in the parent process spawns a new child process to handle the new request and so on.

Unfortunately, each child (worker) process is heavy on server resources, particularly RAM, so the more requests and open connections, the more resources are taxed. To help, Apache developers used a prefork model, which creates a pool of worker processes in advance that are able and ready to take connections. When all the processes in the pool are busy with connections, Apache will create additional worker processes to be ready for future requests.

On the positive side, this setup makes it easy to insert additional code into the overall process that are incorporated in the form of modules and plenty of third party developers took advantage of that ability to write all sorts of modules to extend Apache’s basic functionality.

Aside: If your server runs Apache and you want to see what modules your server is currently running, you can create a very simple PHP file to show you. Here’s all the code you need.

<?php phpinfo(); >

Save the code and name the file whatever you want (as long as it has a .php extension) and visit its URL in a browser and you’ll get all sorts of information. When you’re done remove the file, since you’d rather no one with malicious intent know the information.

Again, workers processes use a lot of RAM and even using mpm_prefork only helps so much. When the number of requests is less than the number of available processes, Apache is very fast, but when the reverse is true, Apache performance degrades and the less powerful the hardware, the more quickly the degradation.

As traffic increases, particularly during a spike, all the spawned processes eat into the available RAM until there’s little or none left. The server can come to a complete halt until existing connections are closed and resources are freed for new ones. Cloud servers that can share resources can combat this because there are more resources to share temporarily when one site has a traffic spike.

Part of the problem is that Apache processes are blocking processes. That is each is handled one at a time as the others wait in a queue. One request needs to complete before the next one can start, which makes inefficient use of resources.

Heavy traffic wasn’t a problem in the early days of the web, which helps account for Apache being process-driven, but as we know web traffic increased and increased and increased and Apache developers added two more multi-processing modules (MPMs) mpm_worker and mpm_event.

  • Using mpm_worker child processes can run multiple worker threads, each able to handle a connection. Threads need less resources than processes, which means fewer processes can handle the same number of connections.
  • The mpm_event module extends the mpm_worker by creating separate and dedicated threads that listen for and manage keep-alive connections.

The performance of static content over Apache is ultimately based on which MPM method is being used, however, mpm_prefork is the only one of the three that works with mod_php, which is the module Apache uses to interpret PHP code.

Given that limitation, if you’re running Apache, you probably have mod_php running and so are using mpm_prefork as your multi-process module. If that’s the case, then your site likely runs well enough most of the time unless there’s a traffic spike in which case your server gets hung up.

You can make some tweaks to Apache, such as increasing the value of max_clients so Apache can handle more connections, but it will run out of RAM sooner or later under heavy traffic.

Overall Apache doesn’t scale as well as the web has scaled in general and a lot of traffic eats up server resources quicker than the server can handle the traffic. I think this is what happens to this very site from time to time.

What Apache Does Well

Based on what I’ve said so far, you might conclude that you wouldn’t want to use Apache at all, but there’s a reason it’s been the most popular software behind web servers for a long time.

Apache has great documentation and plenty of integrated support with other software given that it’s open source. The system of modules Apache uses makes it extensible and configurable by third party developers. People use Apache because it’s flexible and configurable through its many modules and because the software has a lot of support.

Apache is also good at handling dynamic content because it can add modules for things like interpreting PHP (and other languages) so it can process the code without the need for external components.

You also have benefits like .htaccess files which provide for additional configuration. I assume you’ve worked with .htaccess files before and appreciate the ability to make certain tweaks through it. Unfortunately, the file is a performance hog as it can be added to any directory and Apache needs to check all of them for files.

Closing Thoughts

Apache has been the most popular software to run a web server for as long as I can remember. It’s hardly the only software you can use, but there are reasons why it’s been so popular.

Apache is open source making it both free to use and free to extend. Its simple process-driven architecture has led to a lot of third party developers improving and adding to the software.

If anything, Apache’s biggest downside is that it was so successful early on, before the web grew to the size it is now. Apache’s process-based model for handling connections is very performant, that is until it receives a lot of HTTP requests, in which case it can temporarily grind to a halt until open connections are closed and resources are freed to handle new requests.

NGINX, on the other hand was developed years after Apache and after it became clear that Apache was going to have difficulty scaling along with the web. I’ll pick things up next week by talking about the differences in NGINX’s architecture and how it allows the software to scale better than Apache.

« »

Download a free sample from my book, Design Fundamentals.

Leave a Reply

Your email address will not be published. Required fields are marked *