There’s an old Phil Karlton quote that goes something like “There are only two hard things in Computer Science: cache invalidation and naming things.” This article touches on the easier of the two (cache invalidation), or rather, strategies to avoid having to deal with it at all. A good cache strategy is one of the most effective ways to speed up a website.
HTTP Caching 101
Browsers and web servers handle caching through a few HTTP headers. If a browser makes a request for a URL it has never seen before, a server can respond with headers that let browsers know when they will need to request a new copy, or to when to validate that their copy is the most recent.
There are two ways a browser can re-use a resource in it’s HTTP cache. If the cached copy is “fresh” (i.e. it knows the document shouldn’t have changed), it can grab it directly from the cache without making any network requests at all. If the cached copy is “stale” (it’s not sure if it can re-use a resource), it can make what is called a conditional request to the server.
In a nutshell, conditional requests are a way for a browser to say “is my copy of this still the most recent?” and for a server to respond with either “yep” or “nope, but here’s the new version”. Those “yep” responses are much much smaller, since they don’t have to include any of the documents themselves.
To enable conditional requests, you need to add either
Last-Modified headers to your
responses (or both).
Etag headers are identifiers for specific versions of content (usually
implemented as an MD5 hash), whereas
Last-Modified is a pretty self-explanatory date. I personally
like to use
Etags, as they less likely to change based on something like a file being re-created
during a site redeployment or migration or whatnot.
If a browser makes a reqest for a document that it has previously seen, it will send along a
If-Modified-Since header (which correspond to the
Last-Modified headers, respectively). If the
If-None-Match header from the request is the same
Etag header of the response (or if the
If-Modified-Since header from the request is newer
Last-Modified header of the respose), the server can respond with the special
304 - Not Modified status code, and an empty response body.
Cache-Control header is probably the most important header to get right. It has
a lot of options, which
the specification calls “directives”, but for static sites you shouuld only need to use two:
immutable (at least until you add an edge cache into the mix, but we’ll
talk about that later). The
max-age directive lets you tell browsers how many seconds they
can hold onto a resource before they need to request a new copy, and the
immutable diretive tells
browsers that they should never need to request a new copy. The
immutable directive isn’t part
of the core HTTP caching spec, and it isn’t supported accross the board, so it’s best to use it
max-age with a really large number of seconds. e.g.
Cache-Control: immutable, max-age=31557600.
Caching Static Sites
The best way to cache sub-resources is called “Asset Fingerprinting” (also known as Asset Hashing).
Asset Fingerprinting is the process of adding a hash of a file’s content to its filename. So if you
had a file named
/js/main.js, and the md5 hash of it’s content was
adf06cf637aff7c06810711225d7eec6, you could re-name the file to be
doing this, you can ensure that requests to fingerprinted files should always return the same
response. If their content changes, the hash changes. It’s best to automate the process somehow
(most build tools have some way to configure something like this). If automating the process is
overkill, you can also get similar results just by manually adding a version number to the filename
instead, and updating that version number each time you make a change.
Because the content of a fingerprinted file will never change, we can set a
Cache-Control: immutable, max-age=31557600 on them.
One “gotcha” here is that you need to make sure that you not only update the filename whenever a file’s content changes, but also all the places that file is linked to as well.
Caching Root Resources
Because you don’t want the urls in the address bar to constantly be changing each time you update a
site (how annoying would it be manage all those redirects?), root resources shouldn’t be
fingerprinted. Instead, they should be served with a
Cache-Control: max-age=0 header, and an
If done right, these two strategies combined will result in a browser with a primed cache only making a single conditional request for the root document, and recieving a single, tiny, 304 response. If any of the sub-resources are updated, only the root document and whichever sub-resources have been updated will be sent over the network.