HTTP Caching And Cache Validation Over HTTP/1.1

Most of the resources of an HTML page don’t change from day to day. Yesterday’s CSS files are probably the same ones in place today. The same is true for any included Javascript files. I’m also guessing you don’t swap out images on web pages all that often either. So why make browsers download all these resources again the next time they visit the page?

You’ll notice that caching comes up a lot in this series about time to first byte performance and any discussion of web performance. Earlier in the series I talked about DNS caching to improve DNS lookup times and in the next series I’m going to bring up caching for database driven pages.

HTTP caching occurs when a browser stores local copies of resources for later use. If a browser can use local cache for a resource it can avoid an HTTP request for that resource. As I mentioned last week, fewer requests is one of the primary performance optimizations we can make under HTTP/1.1.

Today I want to talk about HTTP caching and show you a few things that can help avoid additional requests to the server and make use of resources that have already been downloaded.

HTTP Cache Headers

Whether or not a browser caches different resources and how long it holds onto them depends on the browser and its user controlled settings. If a user has set his or her browser to not store anything in cache, there’s not much you or I can do to force them to.

Fortunately, few people opt to turn off browser caching and while we’ll never have complete control, we can make recommendations to browsers for what to cache and how long to hold onto it.

Cache-Control Headers

One of the benefits of HTTP/1.1 is that it provides caching headers. One of these is the Cache-Control header. Cache-Control gives you a way to define what resources should be cached, in what manner, and for how long.

Cache-Control isn’t something you simply enable and disable. To properly use it, you want to set several directives associated with it, which means we should first talk about some of those directives starting with, max-age.

As you might guess max-age sets the maximum time in seconds that a cached resource may be used from the time of the initial request. In other words, it sets how long the resource should be cached.

  • One minute: max-age=60
  • One hour: max-age=3600
  • One day: max-age=86400
  • One week: max-age=604800
  • 30 days: max-age=2592000
  • One year: max-age=31536000

Here’s how you would set Cache-Control with a max-age using an .htaccess file.

[code type=html]
Header set Cache-Control “max-age=2592000”
[/code]

The above code would set every file type to be cached for 30 days. You could, if you prefer, set the specific types of files to cache.

[code type=html]

Header set Cache-Control “max-age=2592000”

[/code]

The code above says to cache each of the file types listed (.css, .jpg, .jpeg, etc.) for 30 days. You can also set different types of files to remain cached for different lengths of time.

[code type=html]

Header set Cache-Control “max-age=31536000”


Header set Cache-Control “max-age=2592000”

[/code]

In the code above several types of images are set to be cached for a year, while .css and .js files should remain cached for 30 days.

The max-age directive is probably the most important one to set, but it’s not the only one you can or likely will set.

The public directive says that the response can be cached by any cache (browsers, proxies, etc.) even if it wouldn’t ordinarily be cacheable. Technically you don’t need to specify Cache-Control as public since it’s the default and is implied once you set a max-age, but it’s typical to set it.

[code type=html]
Header set Cache-Control “max-age=2592000, public”
[/code]

A corresponding private directive means the response is specific to a single user’s browser and not the the general public. Resources could be cached by the client, but not intermediate proxies.

[code type=html]
Header set Cache-Control “max-age=2592000, private”
[/code]

The no-cache directive indicates that the client needs to first check with the server before using a cached resource. It doesn’t mean the resource isn’t cached at all, but that the client must check with the sever first to make sure it hasn’t changed.

[code type=html]
Header set Cache-Control “no-cache”
[/code]

If you don’t want something cached at all, you want to use the no-store directive, which explicitly disallows any cache from storing the resource.

[code type=html]
Header set Cache-Control “no-store”
[/code]

Those are probably the directives you’ll use most often, but there are more. Instead of listing them all here though, I’ll point you to The Mozilla Foundation’s page on Cache-Control.

An .htacess file isn’t the only way to set Cache-Control headers. You could set them in a meta tag, though it’s not recommended.

[code type=html]

[/code]

Your options for content in the meta-tag are public, private, no-cache, and no-store. You can’t set a max-age inside the meta tag, one reason it’s not a recommended way to set Cache-Control.

This article will show you how to set Cache-Control for Nginx servers as well as setting it through a PHP header.

Note: If you’ve been developing websites for awhile you may remember the Pragma header. It did similar thing to Cache-Control, but was an HTTP/1.0 header and is no longer used.

The Expires Header

If for some reason you decide not to set a max-age directive, you can still set an end date on how long something should be cached.

The Expires directive sets a date, after which the cached resource should no longer be considered valid.

[code type=html]
Expires:
Expires: , :: GMT
[/code]

The date is an actual date and not a time in seconds from when the resource is downloaded.

[code type=html]
Expires: Wed, 12 July 2017 11:30:00 GMT
[/code]

If you set Cache-Control with a max-age, then Expires will be ignored so you don’t need to set both, though it’s not uncommon to see both included.

Cache Validation

Imagine you visit a web page and your browser caches some of its resources. At a later date you visit the same page. The cache has expired according to Cache-Control: max-age or an Expires header, but the resource itself hasn’t actually changed and the correct resource is being stored in local cache.

You wouldn’t want the browser to download the same image or CSS file, but it has to, right? Not, if you employ cache validation.

Cache Validation with ETags

With ETags enabled, a server generates and delivers a token, usually a hash or other type of fingerprint. The browser stores the token and sends it to the server on the next request for the same resource. If the token is the same, the resource hasn’t changed and doesn’t need to be downloaded again. If the tokens are different, the new version of the resource is delivered over the network.

Here’s an example of an ETag header that might be received by a browser, though you wouldn’t actually set an ETag that looks like this.

[code type=html]
ETag: “686897696a7c876b7e”
[/code]

On Apache servers ETags have three components, INode, MTime, and Size. You can enable ETags in either your .htacess file or your Apache config file. Here’s code for enabling them in the former.

[code type=html]

FileETag INode MTime Size

[/code]

or

[code type=html]

FileETag MTime Size

[/code]

Let me explain why the two different lines of code. ETags are meant to be used on sites which deliver content from a single server. The INode component can cause issues on sites that host content on multiple servers.

In general if you use multiple servers, you probably want to disable ETags, but removing the INode component might correct for any issues. Here’s how you can disable ETags, if you prefer.

[code type=html]

Header unset ETag

FileETag None
[/code]

I’ll again refer you elsewhere for enabling and disabling ETags in Apache config files and on NGinx servers.

Cache Validation with Last-Modified

Another header that can be used for cache validation is the Last-Modified header. It’s not as accurate as an Etag header and would be used as a fallback.

You set it the same way you set an Expires header

[code type=html]
Last-Modified: , :: GMT
[/code]

Which might look like the following with real data.

[code type=html]
Last-Modified: Wed, 12 July 2017 11:30:00 GMT
[/code]

With Last-Modified set, a browser can check if it has the most recent version of a resource the same way as it can using ETags.

If you’re curious, the main difference in approaches between ETags and Last-Modified is the former is content-based while the later is time-based.

HTTP Caching Best Practices

With all these different headers and options you may be wondering what’s the best practice for using them. It depends on the particulars of your site and set up.

For example, in the last section I mentioned you probably want to disable ETags if you serve content from multiple servers, but possibly want them enabled if you serve everything from one server.

How long you set something to remain cached depends on how often the resource might change. How much traffic your site receives as well as the type of data it serves also play roles. There’s no one size fits all solution.

That said, I’ll point you to an article by Jake Archibald with some best practices for different scenarios. He describes the two most common patterns and offers best cache settings for each.

The first pattern deals with immutable content. Things like stylesheets, script files, and images likely don’t change much or at all. In this case it makes sense to set a Cache-Control header with a large max-age.

[code type=html]
Cache-Control: max-age=31536000
[/code]

Usually in this pattern, files are given version numbers as extensions so if the file changes, you change the version number and create a new resource.

The second pattern deals with mutable content, blog posts that are edited and updated, for example. Here, the best practice is to revalidate the content.

[code type=html]
Cache-Control: no-cache
[/code]

Remember that no-cache doesn’t mean don’t cache the resource. It means the resource needs to be checked. In this pattern you would also add either an ETag or a Last-Modified header in order to validate the cache.

Again I recommend reading the entirety of Jake Archibald’s article as he goes into this in more depth than I do here. Google’s developer site also offers tips and techniques for HTTP/1.1 caching.

Closing Thoughts

Caching comes up often when talking about website performance and it’s good practice if your site is delivered over HTTP/1.1. Cache headers provide several ways to configure caching recommendations.

Best practices depend on the specifics of your site, but odds are you’ll either set a long time for user-agents to cache resources or you’ll suggest they always check to see if their local cache of a resource is still valid.

Next week I want to look at HTTP/2, which attempts to solve some of the issues of HTTP/1.1. We’ll see that with HTTP/2 there are different performance strategies and techniques and some of what we do for HTTP/1.1 no longer applies. I’ll also briefly mention HTTPS and how it affects performance.

« »

Download a free sample from my book, Design Fundamentals.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *