Nate Volker

HTTP Caching Strategies for Static Sites

There’s an old Phil Karlton quote that goes something like “There are only two hard things in Computer Science: cache invalidation and naming things.” This article touches on the easier of the two (cache invalidation), or rather, strategies to avoid having to deal with it at all. A good cache strategy is one of the most effective ways to speed up a website.

HTTP Caching 101

Browsers and web servers handle caching through a few HTTP headers. If a browser makes a request for a URL it has never seen before, a server can respond with headers that let browsers know when they will need to request a new copy, or to when to validate that their copy is the most recent.

There are two ways a browser can re-use a resource in it’s HTTP cache. If the cached copy is “fresh” (i.e. it knows the document shouldn’t have changed), it can grab it directly from the cache without making any network requests at all. If the cached copy is “stale” (it’s not sure if it can re-use a resource), it can make what is called a conditional request to the server.

Conditional Requests

In a nutshell, conditional requests are a way for a browser to say “is my copy of this still the most recent?” and for a server to respond with either “yep” or “nope, but here’s the new version”. Those “yep” responses are much much smaller, since they don’t have to include any of the documents themselves.

To enable conditional requests, you need to add either ETag or Last-Modified headers to your responses (or both). Etag headers are identifiers for specific versions of content (usually implemented as an MD5 hash), whereas Last-Modified is a pretty self-explanatory date. I personally like to use Etags, as they less likely to change based on something like a file being re-created during a site redeployment or migration or whatnot.

If a browser makes a reqest for a document that it has previously seen, it will send along a If-None-Match and/or If-Modified-Since header (which correspond to the Etag and Last-Modified headers, respectively). If the If-None-Match header from the request is the same as the Etag header of the response (or if the If-Modified-Since header from the request is newer than the Last-Modified header of the respose), the server can respond with the special 304 - Not Modified status code, and an empty response body.

Cache-Control

The Cache-Control header is probably the most important header to get right. It has a lot of options, which the specification calls “directives”, but for static sites you shouuld only need to use two: max-age=<seconds> and immutable (at least until you add an edge cache into the mix, but we’ll talk about that later). The max-age directive lets you tell browsers how many seconds they can hold onto a resource before they need to request a new copy, and the immutable diretive tells browsers that they should never need to request a new copy. The immutable directive isn’t part of the core HTTP caching spec, and it isn’t supported accross the board, so it’s best to use it alongside a max-age with a really large number of seconds. e.g. Cache-Control: immutable, max-age=31557600.

Caching Static Sites

There are typically two types of resources of static sites: root-resources and sub-resources. Root resources are typically the html responses that correspond to the URLs in the browser’s address bar, and sub-resources are all the resources linked to by that main document (e.g. stylesheets, images, javascript, etc). These two types of resources need two different caching strategies.

Caching Sub-Resources

The best way to cache sub-resources is called “Asset Fingerprinting” (also known as Asset Hashing). Asset Fingerprinting is the process of adding a hash of a file’s content to its filename. So if you had a file named /js/main.js, and the md5 hash of it’s content was adf06cf637aff7c06810711225d7eec6, you could re-name the file to be /js/main.adf06cf6.js. By doing this, you can ensure that requests to fingerprinted files should always return the same response. If their content changes, the hash changes. It’s best to automate the process somehow (most build tools have some way to configure something like this). If automating the process is overkill, you can also get similar results just by manually adding a version number to the filename instead, and updating that version number each time you make a change.

Because the content of a fingerprinted file will never change, we can set a Cache-Control: immutable, max-age=31557600 on them.

One “gotcha” here is that you need to make sure that you not only update the filename whenever a file’s content changes, but also all the places that file is linked to as well.

Caching Root Resources

Because you don’t want the urls in the address bar to constantly be changing each time you update a site (how annoying would it be manage all those redirects?), root resources shouldn’t be fingerprinted. Instead, they should be served with a Cache-Control: max-age=0 header, and an appropriate Etag header.

If done right, these two strategies combined will result in a browser with a primed cache only making a single conditional request for the root document, and recieving a single, tiny, 304 response. If any of the sub-resources are updated, only the root document and whichever sub-resources have been updated will be sent over the network.