About a month or so ago, I started a series about web servers as part of an ongoing series about website performance. I talked a little about different hosting packages and followed that up with some thoughts about server hardware. I then rudely interrupted the series with a couple of posts about productivity.
Last week I picked up the series again and turned my attention to software, specifically Apache server software. I talked about its process-based architecture for handing HTTP requests and connections and how the model doesn’t scale to meet the demands of heavy web traffic.
Today I want to continue and talk about another popular application for running a server, NGINX, that was built with Apache’s limitations in mind and a purpose to solve some of those limitations.
NGINX and Event-Driven Architecture
Apache may be the most popular application for running a web server, but NGINX is no slouch. Websites such as Netflix, Pinterest, WordPress, and Airbnb run on NGINX software and it’s currently reached #3 in popularity behind Apache and IIS.
NGINX uses event-driven architecture (different that Apache’s mpm_event module) as opposed to Apache’s process-based model. It uses this architecture specifically to address the performance and scalability of issues Apache.
At the top is a master process that spawns worker processes, similar to Apache, however, each of NGINX’s worker processes can handle thousands of HTTP requests and connections simultaneously where each Apache worker process can only handle one. This makes NGINX far more scalable when it comes to serving static pages under heavy load.
The system is event-driven, hence the name. Each worker process listens for events that are initiated by HTTP connection requests. When an event is found it’s processed asynchronously in a way that doesn’t block other events.
The connection is assigned a state machine (a set of instructions for how to process the request), most commonly the HTTP state machine, but also those for SMTP, IMAP, etc. When the connection is closed the system knows to stop listening for events and resources are freed for other processes.
Because the worker processes can wait for a request instead of having to be ready for one in advance, they make more efficient use of server resources as compared to Apache. Resources don’t need to be dedicated before they’re needed which makes NGINX non-blocking.
The downside is it’s harder to develop for this kind of architecture and consequently it’s mostly the NGINX team that develops for the software.
This makes NGINX less flexible than Apache as you can’t extend it via a module system. NGINX also doesn’t interpret .htaccess files so you lose the ability to add configuration through the file. Of course, since NGNIX doesn’t have to check every directory for .htaccess files it doesn’t suffer the performance issues involved in checking for them.
This also means that something you might expect NGINX to do do, such as interpret PHP (and other server side languages) code, it can’t do natively. In other words NGINX can’t process dynamic content on its own. Your PHP (or other) code first needs to be sent to an external interpreter and received back before NGINX can pass the file on to the requesting browser.
NGINX as a Reverse Proxy Server for Apache
You might have noticed that NGINX and Apache have something of opposite strengths and weaknesses.
NGINX is really good with HTTP related requests. It can quickly serve static (including cached) content at scale. What it doesn’t do is process dynamic content. Fortunately, Apache does handle dynamic content well. What it struggles with is all those HTTP requests during spikes in traffic.
They complement each other well and would seem to offer an ideal solution where NGINX handles all the HTTP requests and serves what static content it can, while passing on requests for dynamic content to Apache.
The common setup is to use NGINX as a reverse proxy server. A proxy server sits between your backend web server and the general internet. It accepts HTTP requests, processes what it can, and forwards the request to external services when it can’t handle them. You can think of it as a gatekeeper of sorts.
There’s also a forward proxy sever or just proxy server, which for the most part does the same thing as the reverse proxy. The difference is a matter of the direction information is flowing and which computers are aware of which others computers are part of the connection. Here’s a better and more detailed explanation.
A reverse proxy setup has a number of benefits, which you can read about in this article by Mike Hadlow. Allow me call out a few benefits that are specifically related to website performance.
- Load Balancing: HTTP requests can be routed to different, but identical, servers to keep any one of them from being overloaded with requests. Naturally this requires you have multiple backend servers.
- Caching: A reverse proxy can act as cache for dynamic content.
- Compression/Decompression: The reverse proxy can handle compressing and decompressing requests reducing the load on the backend server.
- URL Rewriting: Another task typically performed by the back end server that can be handled by the proxy.
- SSL: The proxy can remove the need for many, though not all certificates, once again reducing load on the backend server.
When used as a reverse proxy for Apache, NGINX accepts HTTP requests, processes what it can, and hands off the rest to Apache. NGINX does what it does well, scaling to handle more requests and serving static content and Apache does what it does well, turning dynamic content into static files, which are passed back to NGINX.
Less requests reaching Apache means it’s less likely that Apache will get hung up and less likely your site will be temporarily unreachable.
Using NGINX as a reverse proxy server for Apache is the setup I want to implement here in the hopefully not too distant future. It’s toward the top of my list of server changes. Much of the time this site is pretty fast, but it can come to a complete halt during traffic spikes. For a long time I didn’t understand why it was happening, but now I think I know.
I also think my hosting company will set this up for me, but it’s something I need to check to make sure.
That won’t help you set it up though and since I’m probably not the best person to ask, I’ll point you to a few resources that will show you how. You probably want to check with your host to see if they’ll do it for you first.
- Setting up an Nginx Reverse Proxy
- How To Configure Nginx as a Reverse Proxy for Apache
- How To Configure Nginx as a Web Server and Reverse Proxy for Apache on One Ubuntu 16.04 Server
NGINX came after Apache and so had the luxury of knowing Apache’s scalability issues when it was being written. As a result NGINX was designed to excel in places where Apache struggles.
It was developed to handle thousands of simultaneous HTTP requests and connections and to serve static content efficiently and quickly. CPU and RAM tend to remain more consistent with NGINX due to its event-driven architecture that doesn’t need to spawn a new process each time a request is received.
Ultimately NGINX is faster when it comes to delivering static content (including cached pages) and Apache is faster when it comes to delivering dynamic content.
Fortunately you can use both together doing what each does best by having NGINX run as a reverse proxy server in front of Apache.
There are two more server software options I want to talk about, Internet Information Services (IIS) and LiteSpeed. I’ll talk about IIS next week and then I’ll close out this min-series about server software the following week with some thoughts about LiteSpeed.
Download a free sample from my book, Design Fundamentals.