HTML5 cache manifest, or an experiment using unfinished and undocumented technology

I’m on a project right now that needs to work on or offline (no, not for mobile, but still confined to webkit), so HTML5′s manifest attribute is the obvious choice. Unfortunately, the implementation even on Safari is spotty at best, and the documentation of precisely what is necessary to make it work (or troubleshoot what went wrong) is nearly nonexistent. That which has been written is largely incomplete or inaccurate, not because of laziness or incompetency but due to the fact that the specification itself is still in flux!

The bits I’m having trouble with at the moment:

  • There’s some HTTP header other than the mime type which (I think) is preventing a simple test page from working on one server. I think it’s “Accept-Ranges: bytes”, but I’m not sure.
  • Safari (4.0.5) handles loading new versions of the cached files in a way that I suspect is a design flaw rather than a bug. Safari isn’t my target platform (QTwebkit is), but I’m attempting to use it as a reference for how things are supposed to work so I can verify how good or bad QT’s implementation is.

On the latter issue, a little background explanation is required. A cache manifest file is (mainly) just a short list of files that may or may be not used by your html5 page. The page itself is listed in there as well:

#version: 1

When you navigate to offline.html (whose html element has the manifest attribute pointing to the file above) the first time, the browser behaves more or less as normal. But when you go the second time, instead of checking with the server to see if offline.html has changed, it only checks to see if the manifest file has changed. If it has, then it should check for updates to all the files listed within it.

That’s the expected behaviour. What Safari seems to do requires a third loading to describe, however.

Let’s say that after the first loading, you make a change to offline.html, and update the version number in your manifest to prompt the update. If you reload the page, offline.html will not appear to have changed. Reload once more, however, and you’ll see the change.

This is where it gets weird: with the first reload, the Apache access log shows offline.html being requested, and delivered in full with a 200 (and not a 304). The second reload shows only the manifest being requested (with a 200). This means that Safari is getting the updated html file but not using it.

If you think about it, this might be the intended behaviour. Here’s what happens on the first reload:

  1. Safari checks its list of manifests to see if any mention offline.html.
  2. They do, so Safari loads up the cached version of it.
  3. The html element references a manifest, so it checks the manifest to see if it’s changed.
  4. It has, so it checks all the files within to see if they’ve changed, and updates the cache.
  5. The page continues rendering. Since the old version of offline.html was already loaded (which is how Safari knew to check the manifest at all), we don’t see the new version.

Obviously, this isn’t very intuitive, and it’s easy to think you’re seeing a bug. My thinking on the matter is that the manifest might not belong on the page it’s caching, but on another page which redirects or otherwise brings up that page. Alternatively (and for my project this is probably the best solution), the core page would almost never change, and all files would be brought in by the initial JS.