HTTP Caching Strategies for Static Sites
There’s an old Phil Karlton quote that goes something like “There are only two hard things in Computer Science: cache invalidation and naming things.” This article touches on the easier of the two (cache invalidation), or rather, strategies to avoid having to deal with it at all. A good cache strategy is one of the most effective ways to speed up a website.
HTTP Caching 101
Browsers and web servers handle caching through a few HTTP headers. If a browser makes a request for a URL it has never seen before, a server can respond with headers that let browsers know when they will need to request a new copy, or to when to validate that their copy is the most recent.
There are two ways a browser can re-use a resource in it’s HTTP cache. If the cached copy is “fresh” (i.e. it knows the document shouldn’t have changed), it can grab it directly from the cache without making any network requests at all. If the cached copy is “stale” (it’s not sure if it can re-use a resource), it can make what is called a conditional request to the server.
Conditional Requests
In a nutshell, conditional requests are a way for a browser to say “is my copy of this still the most recent?” and for a server to respond with either “yep” or “nope, but here’s the new version”. Those “yep” responses are much much smaller, since they don’t have to include any of the documents themselves.
To enable conditional requests, you need to add either ETag
or Last-Modified
headers to your
responses (or both). Etag
headers are identifiers for specific versions of content (usually
implemented as an MD5 hash), whereas Last-Modified
is a pretty self-explanatory date. I personally
like to use Etag
s, as they less likely to change based on something like a file being re-created
during a site redeployment or migration or whatnot.
If a browser makes a reqest for a document that it has previously seen, it will send along a
If-None-Match
and/or If-Modified-Since
header (which correspond to the Etag
and
Last-Modified
headers, respectively). If the If-None-Match
header from the request is the same
as the Etag
header of the response (or if the If-Modified-Since
header from the request is newer
than the Last-Modified
header of the respose), the server can respond with the special
304 - Not Modified
status code, and an empty response body.
Cache-Control
The Cache-Control
header is probably the most important header to get right. It has
a lot of options, which
the specification calls “directives”, but for static sites you shouuld only need to use two:
max-age=<seconds>
and immutable
(at least until you add an edge cache into the mix, but we’ll
talk about that later). The max-age
directive lets you tell browsers how many seconds they
can hold onto a resource before they need to request a new copy, and the immutable
diretive tells
browsers that they should never need to request a new copy. The immutable
directive isn’t part
of the core HTTP caching spec, and it isn’t supported accross the board, so it’s best to use it
alongside a max-age
with a really large number of seconds. e.g.
Cache-Control: immutable, max-age=31557600
.
Caching Static Sites
There are typically two types of resources of static sites: root-resources and sub-resources. Root resources are typically the html responses that correspond to the URLs in the browser’s address bar, and sub-resources are all the resources linked to by that main document (e.g. stylesheets, images, javascript, etc). These two types of resources need two different caching strategies.
Caching Sub-Resources
The best way to cache sub-resources is called “Asset Fingerprinting” (also known as Asset Hashing).
Asset Fingerprinting is the process of adding a hash of a file’s content to its filename. So if you
had a file named /js/main.js
, and the md5 hash of it’s content was
adf06cf637aff7c06810711225d7eec6
, you could re-name the file to be /js/main.adf06cf6.js
. By
doing this, you can ensure that requests to fingerprinted files should always return the same
response. If their content changes, the hash changes. It’s best to automate the process somehow
(most build tools have some way to configure something like this). If automating the process is
overkill, you can also get similar results just by manually adding a version number to the filename
instead, and updating that version number each time you make a change.
Because the content of a fingerprinted file will never change, we can set a
Cache-Control: immutable, max-age=31557600
on them.
One “gotcha” here is that you need to make sure that you not only update the filename whenever a file’s content changes, but also all the places that file is linked to as well.
Caching Root Resources
Because you don’t want the urls in the address bar to constantly be changing each time you update a
site (how annoying would it be manage all those redirects?), root resources shouldn’t be
fingerprinted. Instead, they should be served with a Cache-Control: max-age=0
header, and an
appropriate Etag
header.
If done right, these two strategies combined will result in a browser with a primed cache only making a single conditional request for the root document, and recieving a single, tiny, 304 response. If any of the sub-resources are updated, only the root document and whichever sub-resources have been updated will be sent over the network.