Nate Volker

HTTP/2 Server Push with Cloudflare Workers

HTTP/2 is a big step forward for how the web works. The vast majority of its features require no additional effort from web developers, and are entirely invisible to end-users (other than faster loading times). Multiplexing and header compression give significant benefits automatically. One of the features that got a lot of hype when HTTP/2 was new was server-push, yet very few sites really take advantage of it. There are likely a number of reasons for this, which Jake Archibald does a great job of outlining, but done correctly it can be a major performance boost.

What is HTTP/2 Server Push?

In a nutshell, HTTP/2 server-push is a way to avoid extra "round trips" by sending files to browsers before they request it. For example, if every page on your site includes the same stylesheet, you can push it along with the initial request, instead of waiting for the browser to find the <link rel="stylesheet"> tag in the HTML.

Many web servers and edge-cache services have implemented server-push by giving extra meaning to the Link: rel=preload HTTP header. If the server or edge-cache sees that that header is set, it pushes the file. For example, a request with the following headers set:

Link: </css/style.css>; rel=preload; as=style
Link: </js/script.js>; rel=preload; as=script
Link: </img/image.png>; rel=preload; as=image
this code is published with no rights reserved

would push /css/style.css, /js/script.js, and /img/image.png along with the initial response.

If a browser doesn't support HTTP/2 server-push, this ends up gracefully falling back to how Link: re=preload traditionally works.

Overpushing

One of the main issues with HTTP/2 server-push is that it puts the burden of what to push and when onto web developers. If you push a file to a browser that it already has in its HTTP cache, you're just wasting bandwidth. Cache digests aim to solve this problem by sending a bit of extra data from browsers to servers, informing servers of what's already in the cache.

Unfortunately, we're still a ways out from that specification being implemented, so the burden of keeping track of cached files falls on web developers.

Cloudflare Workers

Cloudflare, the edge-cache that www.natevolker.com uses, implements HTTP/2 server-push using the Link header as described earlier. They also offer a way to do "serverless" edge funtions that they call Cloudflare Workers. Cloudflare Workers re-use the same Javascript APIs as service-workers, so they will likely feel very familiar for many web developers.

Using Cloudflare Workers to Automate HTTP/2 Server-Push

One thing Cloudflare Workers are great for is automating HTTP/2 server-push. They can modify the request before it hits their cache (or the origin server), and modify the response that gets sent to back to browsers. Smartly, the step that adds the pushed resources corresponding to the Link headers happens after the Cloudflare Worker, so it's really easy to push resources via the worker itself. So pushing a resource is as simple as:

addEventListener('fetch', (event) => {
  const { request } = event;

  event.respondWith(async () => {
    const response = await fetch(request);
    response.headers.append(
      'Link',
      '</css/style.css>; rel=preload; as=style'
    );
    return response;
  }());
});
this code is published with no rights reserved

The hard part is figuring out which files to push and when, and how to prevent overpushing in lieu of cache digests.

The solution I came up with, and that I use on www.natevolker.com, uses HTTP Cookies. The steps that I take in the Cloudflare Worker are, roughly:

  1. Look for Link headers set by my origin server
  2. Checks those Link headers against a special cache-digest cookie that I set to see if it has already been pushed
  3. If it has already been pushed, surpress the Link header from being sent in the response
  4. If it hasn't been pushed, allow the Link header though and update the cache-digest cookie

My first attempt at this was to mirror the cache-digest spec as closely as possible, and to use that as the value for my cache-digest cookie. After spending too much time following this path and still somehow overpushing assets, I decided to do something a little more simple. All the filenames of sub-resources that I want to push www.natevolker.com include a hash of the file's contents, so from just those filenames, I can create a list of key-value pairs, where the key represents the unhashed filename, and the value represents the hash of that file's contents. To prevent the cookie from getting too big too quickly, I also hash the filename. So if i've pushed two files to a browser, it might have a cache-digest cookie set to something like:

826e8142-e6baabe8:af779f5f-490cf5f5

which represents something like:

hash(/path/to/file-1.txt)-hash(file-1's contents):hash(/path/to/file-2.txt)-hash(file-2's contents)

If that browser then visits a different page that pushes a different resource, it would just add another key-value pair onto the end of the cookie. If the contents of a pushed file is updated, then the content hash would change, and so the existing pair would just be updated.

Parsing the Cookie Headers

The trickiest part of this strategy is that it involves parsing and rebuilding both the Cookie header in each request, as well as the Set-Cookie header in the response. The Cookie header is relatively straight-forward:

const parseCookies = cookieString => cookieString
  .split(';')
  .map(pair => pair.split('=', 2).map(s => s.trim()))
  .reduce((cookies, [key, value]) => Object.assign(cookies, { [key]: value }), {});

const buildCookies = cookies => Object.entries(cookies)
  .filter(pair => typeof pair[1] === 'undefined')
  .map(([key, val]) => `${key}=${val}`)
  .join('; ');
this code is published with no rights reserved

The Set-Cookie header is a bit more complicated:

const kvps = ['Expires', 'Max-Age', 'Domain', 'Path', 'SameSite'];
const flags = ['Secure', 'HttpOnly'];
const inArr = (s, arr) => arr.some(key => s.toLowerCase().indexOf(key.toLowerCase()) === 0);

const parseSetCookieString = str => (str || '')
  .split(';')
  .map(s => s.trim())
  .reduce((acc, part) => {
    const isKvp = inArr(part, kvps) && part.match(/=/g) && part.match(/=/g).length > 1;
    const isFlag = !isKvp && inArr(part, flags) && part.match(/=/g);
    if (isKvp || isFlag) {
      const [first, ...rest] = part.split(',');
      acc.push(first.trim());
      acc.push(rest.join(',').trim());
    } else if (part) {
      acc.push(part);
    }
    return acc;
  }, [])
  .reduce(([current, acc], part) => {
    if (!inArr(part, [...kvps, ...flags])) {
      const [key, value] = part.split('=');
      return [key, Object.assign(acc, { [key]: { value } })];
    }
    if (inArr(part, kvps)) {
      const [key, value] = part.split('=');
      acc[current][key] = value;
    } else if (inArr(part, flags)) {
      acc[current][part] = true;
    }
    return [current, acc];
  }, ['', {}])
  .pop();


const buildSetCookieString = cookies => Object.entries(cookies)
  .map(([name, cookie]) => {
    const str = [`${name}=${cookie.value}`];
    [...kvps, ...flags].forEach((key) => {
      const value = cookie[key] || cookie[key.toLowerCase()];
      if (value === true) {
        str.push(key);
      } else if (value) {
        str.push(`${key}=${value}`);
      }
    });
    return str.join('; ');
  })
  .join(', ');
this code is published with no rights reserved

The above methods aren't perfect, but because I can control all the cookies I set on my site (other than the ones Cloudflare sets), they are more than sufficient.

Handling Requests

The real meat of the Cloudflare Worker happens here:

const quickHash = str => str
  .split('')
  .reduce((hash, char) => ((hash << 8) - hash) + char.charCodeAt(0), 0)
  .toString(16)
  .replace('-', '');

const getFilenameFromLink = (link) => {
  const matches = link.match(/^<([^>]*)>/);
  return matches ? matches[1] : '';
};

const getContentHashFromFilename = filename => filename
  .replace(/.+\.([a-zA-Z0-9]{8})\..{2,12}.*/, '$1');

const stripHashFromFilename = filename => filename
  .replace(/(.+)\.[a-zA-Z0-9]{8}\.([^.]{2,12}).*/, '$1.$2');

const parseCacheDigestCookie = cacheDigest => (cacheDigest || '')
  .split(':')
  .map(pair => pair.split('-'))
  .filter(([key, value]) => !!key && !!value)
  .reduce((all, [key, val]) => Object.assign(all, { [key]: val }), {});

const buildCacheDigestCookie = cacheDigest => Object.entries(cacheDigest)
  .map(([key, val]) => `${key}-${val}`)
  .join(':');

const handleRequest = async (request) => {
  const cookies = parseCookies(request.headers.get('Cookie') || '');
  const oldCacheDigestCookie = cookies[cacheDigestCookieName];
  const cacheDigest = parseCacheDigestCookie(oldCacheDigestCookie);
  const newCookies = buildCookies(
    Object.assign({}, cookies, { [cacheDigestCookieName]: undefined }),
  );
  const newRequest = new Request(request.url, {
    method: request.method,
    headers: request.headers,
    redirect: 'manual',
  });
  newRequest.headers.set('cookie', newCookies);

  const response = await fetch(newRequest);
  const contentType = response.headers.get('content-type');
  const links = response.headers.get('link') || '';

  if (!response.ok || !contentType.includes('text/html') || links === '') {
    return response;
  }

  const newResponse = new Response(response.body, response);
  newResponse.headers.delete('Link');

  links.split(',').forEach((link) => {
    const filename = getFilenameFromLink(link);
    const contentHash = getContentHashFromFilename(filename);
    const filenameHash = quickHash(stripHashFromFilename(filename));
    if (cacheDigest[filenameHash] !== contentHash) {
      cacheDigest[filenameHash] = contentHash;
      newResponse.headers.append('Link', link);
    }
  });

  const newCacheDigestCookie = buildCacheDigestCookie(cacheDigest);
  const responseCookies = parseSetCookieString(newResponse.headers.get('Set-Cookie'));

  if (newCacheDigestCookie !== oldCacheDigestCookie) {
    responseCookies[cacheDigestCookieName] = {
      value: newCacheDigestCookie,
      HttpOnly: true,
      'Max-Age': '31536000',
      SameSite: 'Lax',
      Secure: true,
    };
  } else {
    delete responseCookies[cacheDigestCookieName];
  }

  if (responseCookies.__cfduid) {
    responseCookies.__cfduid.SameSite = 'Lax';
  }

  newResponse.headers.set('Set-Cookie', buildSetCookieString(responseCookies));

  return newResponse;
};

addEventListener('fetch', (event) => {
  const host = event.request.headers.get('host');
  const url = (event.request.url.split(host).pop() || '').split('?').shift();
  if (event.request.method === 'GET' && !url.match(/\..{2,12}$/)) {
    event.respondWith(handleRequest(event.request));
  }
});
this code is published with no rights reserved

With that worker in place (and assuming no meddling with cookies/cache on the user's part), each version of a pushed file gets pushed only once, and if it's content changes it automatically gets pushed again.

The biggest downside of this approach is that it uses an HTTP cookie, and that cookie grows linearly with the amount of files that are pushed. Since cookies are included in each HTTP request, not just for the root resources, it could result in a lot of wasted bandwidth. Luckily, one of the other features of HTTP/2 is header compression! So, in practice, everything performs really well.