A Consistent Approach To Client-Side Cache Invalidation
Join the DZone community and get the full member experience.
Join For FreeDownload the source code for this blog entry here: ClientSideCacheInvalidation.zip
TL;DR?
Please scroll down to the bottom of this article to review the summary.
I ran into a problem not long ago where some JSON results from an AJAX call to an ASP.NET MVC JsonResult action were being cached by the browser, quite intentionally by design, but were no longer up-to-date, and without devising a new approach to route manipulation or any of the other fundamental infrastructural designs for the endpoints (because there were too many) our hands were tied. The caching was being done using the ASP.NETOutputCacheAttribute on the action being invoked in the AJAX call, something like this (not really, but this briefly demonstrates caching):
[OutputCache(Duration = 300)] public JsonResult GetData() { return Json(new { LastModified = DateTime.Now.ToString() }, JsonRequestBehavior.AllowGet); }
@model dynamic @{ ViewBag.Title = "Home"; } <h2>Home</h2> <div id="results"></div> <div><button id="reload">Reload</button></div> @section scripts { <script> var $APPROOT = "@Url.Content("~/")"; $.getJSON($APPROOT + "Home/GetData", function (o) { $('#results').text("Last modified: " + o.LastModified); }); $('#reload').on('click', function() { window.location.reload(); }); </script> }
Since we were using a generalized approach to output caching (as we should), I knew that any solution to this problem should also be generalized. My first thought was in the mistaken assumption that the default [OutputCache] behavior was to rely on client-side caching, since client-side caching was what I was observing while using Fiddler. (Mind you, in the above sample this is not the case, it is actually server-side, but this is probably because of the amount of data being transferred. I’ll explain after I explain what I did in my false assumption.)
Microsoft’s default convention for implementing cache invalidation is to rely on “VaryBy..” semantics, such as varying the route parameters. That is great except that the route and parameters were currently not changing in our implementation.
So, my initial proposal was to force the caching to be done on the server instead of on the client, and to invalidate when appropriate.
public JsonResult DoSomething() { // // Do something here that has a side-effect // of making the cached data stale // Response.RemoveOutputCacheItem(Url.Action("GetData")); return Json("OK"); } [OutputCache(Duration = 300, Location = OutputCacheLocation.Server)] public JsonResult GetData() { return Json(new { LastModified = DateTime.Now.ToString() }, JsonRequestBehavior.AllowGet); }
<button id="invalidate">Invalidate</button></div>
$('#invalidate').on('click', function() { $.post($APPROOT + "Home/DoSomething", null, function(o) { window.location.reload(); }, 'json'); });
While Reload has no effect on the Last modified value, the
Invalidate button causes the date to increment.
When testing, this actually worked quite well. But concerns were raised about the payload of memory on the server. Personally I think the memory payload in practically any server-side caching is negligible, certainly if it is small enough that it would be transmitted over the wire to a client, so long as it is measured in kilobytes or tens of kilobytes and not megabytes. I think the real concern is that transmission; the point of caching is to make the user experience as smooth and seamless as possible with minimal waiting, so if the user is waiting for a (cached) payload, while it may be much faster than the time taken to recalculate or re-acquire the data, it is still measurably slower than relying on browser cache.
The default implementation of OutputCacheAttribute is actually OutputCacheLocation.Any. This indicates that the cached item can be cached on the client, on a proxy server, or on the web server. From my tests, for tiny payloads, the behavior seemed to be caching on the server and no caching on the client; for a large payload from GET requests with querystring parameters seemed to be caching on the client but with an HTTP query with an “If-Modified-Since” header, resulting in a 304 Not Modified on the server (indicating it was also cached on the server but verified by the server that the client’s cache remains valid); and for a large payload from GET requests with all parameters in the path, the behavior seemed to be caching on the client without any validation checking from the client (no HTTP request for an If-Modified-Since check). Now, to be quite honest I am only guessing that these were the distinguishing factors of these behavior observations. Honestly, I saw variations of these behaviors happening all over the place as I tinkered with scenarios, and this was the initial pattern I felt I was observing.
At any rate, for our purposes we were currently stuck with relying on “Any” as the location, which in theory would remove server-side caching if the server ran short on RAM (in theory, I don’t know, although the truth can probably be researched, which I don’t have time to get into). The point of all this is, we have client-side caching that we cannot get away from.
So, how do you invalidate the client-side cache? Technically, you really can’t. The browser controls the cache bucket and no browsers provide hooks into the cache to invalidate them. But we can get smart about this, and work around the problem, by bypassing the cached data. Cached HTTP results are stored on the basis of varying by the full raw URL on HTTP GET methods, they are cached with an expiration (in the above sample’s case, 300 seconds, or 5 minutes), and are only cached if allowed to be cached in the first place as per the HTTP header directives in the HTTP response. So, to bypass the cache you don’t cache, or you need to know up front how long the cache should remain until it expires—neither of these being acceptable in a dynamic application—or you need to use POST instead of GET, or you need to vary up the URL.
Microsoft originally got around the caching problem in ASP.NET 1.x by forcing the “normal” development cycle in the lifecycle of <form> tags that always used the POST method over HTTP. Responses from POST requests are never cached. But POSTing is not clean as it does not follow the semantics of the verbiage if nothing is being sent up and data is only being retrieved.
You can also use ETag in the HTTP headers, which isn’t particularly helpful in a dynamic application as it is no different from a URL + expiration policy.
To summarize, to control cache:
- Disable caching from the server in the Response header (Pragma: no-cache)
- Predict the lifetime of the content and use an expiration policy
- Use POST not GET
- Etag
- Vary the URL (case-sensitive)
Given our options, we need to vary up the URL. There a number of approaches to this, but almost all of the approaches involve relying on appending or modifying the querystring with parameters that are expected to be ignored by the server.
$.getJSON($APPROOT + "Home/GetData?_="+Date.now(), function (o) { $('#results').text("Last modified: " + o.LastModified); });
In this sample, the URL is appended with “?_=”+Date.now(), resulting in this URL in the GET:
/Home/GetData?_=1376170287015
This technique is often referred to as cache-busting. (And if you’re reading this blog article, you’re probably rolling your eyes. “Duh.”) jQuery inherently supports cache-busting, but it does not do it on its own from $.getJSON(), it only does it in $.ajax() when the options parameter includes {cache: false}, unless you invoke $.ajaxSetup({ cache: false }); first to disable all caching. Otherwise, for $.getJSON() you would have to do it manually by appending the URL. (Alright, you can stop rolling your eyes at me now, I’m just trying to be thorough here..)
This is not our complete solution. We have a couple problems we still have to solve.
First of all, in a complex client codebase, hacking at the URL from application logic might not be the most appropriate approach. Consider if you’re using Backbone.js with routes that synchronize objects to and from the server. It would be inappropriate to modify the routes themselves just for cache invalidation. A more generalized cache invalidation technique needs to be implemented in the XHR-invoking AJAX function itself. The approach in doing this will depend upon your Javascript libraries you are using, but, for example, if jQuery.getJSON() is being used in application code, then jQuery.getJSON itself could perhaps be replaced with an invalidation routine.
var gj = $.getJSON; $.getJSON = function (url, data, callback) { url = invalidateCacheIfAppropriate(url); // todo: implement something like this return gj.call(this, url, data, callback); };
This is unconventional and probably a bad example since you’re hacking at a third party library, a better approach might be to wrap the invocation of $.getJSON() with an application function.
var getJSONWrapper = function (url, data, callback) { url = invalidateCacheIfAppropriate(url); // todo: implement something like this return $.getJSON(url, data, callback); };
And from this point on, instead of invoking $.getJSON() in application code, you would invoke getJSONWrapper, in this example.
The second problem we still need to solve is that the invalidation of cached data that derived from the server needs to be triggered by the server because it is the server, not the client, that knows that client cached data is no longer up-to-date. Depending on the application, the client logic might just know by keeping track of what server endpoints it is touching, but it might not! Besides, a server endpoint might have conditional invalidation triggers; the data might be stale given specific conditions that only the server may know and perhaps only upon some calculation. In other words, invalidation needs to be pushed by the server.
One brute force, burdensome, and perhaps a little crazy approach to this might be to use actual “push technology”, formerly “Comet” or “long-polling”, now WebSockets, implemented perhaps with ASP.NET SignalR, where a connection is maintained between the client and the server and the server then has this open socket that can push invalidation flags to the client.
We had no need for that level of integration and you probably don’t either, I just wanted to mention it because it might come back as food for thought for a related solution. One scenario I suppose where this might be useful is if another user of the web application has caused the invalidation, in which case the current user will not be in the request/response cycle to acquire the invalidation flag. Otherwise, it is perhaps a reasonable assumption that invalidation is only needed, and only triggered, in the context of a user’s own session. If not, perhaps it is a “good enough” assumption even if it is sometimes not true. The expiration policy can be set low enough that a reasonable compromise can be made between the current user’s changes and changes invoked by other systems or other users.
While we may not know what server endpoint might introduce the invalidation of client cache data, we could assume that the invalidation will be triggered by any server endpoint(s), and build invalidation trigger logic on the response of server HTTP responses.
To begin implementing some sort of invalidation trigger on the server I could flag invalidations to the client using HTTP header(s).
public JsonResult DoSomething() { // // Do something here that has a side-effect // of making the cached data stale // InvalidateCacheItem(Url.Action("GetData")); return Json("OK"); } public void InvalidateCacheItem(string url) { Response.RemoveOutputCacheItem(url); // invalidate on server Response.AddHeader("X-Invalidate-Cache-Item", url); // invalidate on client } [OutputCache(Duration = 300)] public JsonResult GetData() { return Json(new { LastModified = DateTime.Now.ToString() }, JsonRequestBehavior.AllowGet); }
At this point, the server is emitting a trigger to the HTTP client that says that “as a result of a recent operation, that other URL, the one for GetData, is no longer valid for your current cache, if you have one”. The header alone can be handled by different client implementations (or proxies) in different ways. I didn’t come across any “standard” HTTP response that does this “officially”, so I’ll come up with a convention here.
Now we need to handle this on the client.
Before I do anything first of all I need to refactor the existing AJAX functionality on the client so that instead of using $.getJSON, I might use $.ajax or some other flexible XHR handler, and wrap it all in custom functions such as httpGET()/httpPOST() and handleResponse().
var httpGET = function(url, data, callback) { return httpAction(url, data, callback, "GET"); }; var httpPOST = function (url, data, callback) { return httpAction(url, data, callback, "POST"); }; var httpAction = function(url, data, callback, method) { url = cachebust(url); if (typeof(data) === "function") { callback = data; data = null; } $.ajax(url, { data: data, type: "GET", success: function(responsedata, status, xhr) { handleResponse(responsedata, status, xhr, callback); } }); }; var handleResponse = function (data, status, xhr, callback) { handleInvalidationFlags(xhr); callback.call(this, data, status, xhr); }; function handleInvalidationFlags(xhr) { // not yet implemented }; function cachebust(url) { // not yet implemented return url; }; // application logic httpGET($APPROOT + "Home/GetData", function(o) { $('#results').text("Last modified: " + o.LastModified); }); $('#reload').on('click', function() { window.location.reload(); }); $('#invalidate').on('click', function() { httpPOST($APPROOT + "Home/Invalidate", function (o) { window.location.reload(); }); });
At this point we’re not doing anything yet, we’ve just broken up the HTTP/XHR functionality into wrapper functions that we can now modify to manipulate the request and to deal with the invalidation flag in the response. Now all our work will be in handleInvalidationFlags() for capturing that new header we just emitted from the server, and cachebust() for hijacking the URLs of future requests.
To deal with the invalidation flag in the response, we need to detect that the header is there, and add the cached item to a cached data set that can be stored locally in the browser with web storage. The best place to put this cached data set is in sessionStorage, which is supported by all current browsers. Putting it in a session cookie (a cookie with no expiration flag) works but is less ideal because it adds to the payload of all HTTP requests. Putting it in localStorage is less ideal because we do want the invalidation flag(s) to go away when the browser session ends, because that’s when the original browser cache will expire anyway. There is one caveat to sessionStorage: if a user opens a new tab or window, the browser will drop the sessionStorage in that new tab or window, but may reuse the browser cache. The only workaround I know of at the moment is to use localStorage (permanently retaining the invalidation flags) or a session cookie. In our case, we used a session cookie.
Note also that IIS is case-insensitive on URI paths, but HTTP itself is not, and therefore browser caches will not be. We will need to ignore case when matching URLs with cache invalidation flags.
Here is a more or less complete client-side implementation that seems to work in my initial test for this blog entry.
function handleInvalidationFlags(xhr) { // capture HTTP header var invalidatedItemsHeader = xhr.getResponseHeader("X-Invalidate-Cache-Item"); if (!invalidatedItemsHeader) return; invalidatedItemsHeader = invalidatedItemsHeader.split(';'); // get invalidation flags from session storage var invalidatedItems = sessionStorage.getItem("invalidated-cache-items"); invalidatedItems = invalidatedItems ? JSON.parse(invalidatedItems) : {}; // update invalidation flags data set for (var i in invalidatedItemsHeader) { invalidatedItems[prepurl(invalidatedItemsHeader[i])] = Date.now(); } // store revised invalidation flags data set back into session storage sessionStorage.setItem("invalidated-cache-items", JSON.stringify(invalidatedItems)); } // since we're using IIS/ASP.NET which ignores case on the path, we need a function to force lower-case on the path function prepurl(u) { return u.split('?')[0].toLowerCase() + (u.indexOf("?") > -1 ? "?" + u.split('?')[1] : ""); } function cachebust(url) { // get invalidation flags from session storage var invalidatedItems = sessionStorage.getItem("invalidated-cache-items"); invalidatedItems = invalidatedItems ? JSON.parse(invalidatedItems) : {}; // if item match, return concatonated URL var invalidated = invalidatedItems[prepurl(url)]; if (invalidated) { return url + (url.indexOf("?") > -1 ? "&" : "?") + "_nocache=" + invalidated; } // no match; return unmodified return url; }
Note that the date/time value of when the invalidation occurred is permanently stored as the concatenation value. This allows the data to remain cached, just updated to that point in time. If invalidation occurs again, that concatenation value is revised to the new date/time.
Running this now, after invalidation is triggered by the server, the subsequent request of data is appended with a cache-buster querystring field.
In Summary, ..
.. a consistent approach to client-side cache invalidation triggered by the server might be by following these steps.
- Use X-Invalidate-Cache-Item as an HTTP response header to flag potentially cached URLs as expired. You might consider using a semicolon-delimited response to list multiple items. (Do not URI-encode the semicolon when using it as a URI list delimiter.) Semicolon is a reserved/invalid character in URI and is a valid delimiter in HTTP headers, so this is valid.
- Someday, browsers might support this HTTP response header by automatically invalidating browser cache items declared in this header, which would be awesome. In the mean time ...
- Capture these flags on the client into a data set, and store the data set
into session storage in the format:
- {
- "http://url.com/route/action": (date_value_of_invalidation_flag),
- "http://url.com/route/action/2": (date_value_of_invalidation_flag)
- }
{ "http://url.com/route/action": (date_value_of_invalidation_flag), "http://url.com/route/action/2": (date_value_of_invalidation_flag) }
- Hijack all XHR requests so that the URL is appropriately appended with cachebusting querystring parameter if the URL was found in the invalidation flags data set, i.e. http://url.com/route/action becomes something like http://url.com/route/action?_nocache=(date_value_of_invalidation_flag), being sure to hijack only the XHR request and not any logic that generated the URL in the first place.
- Remember that IIS and ASP.NET by default convention ignore case (“/Route/Action” == “/route/action”) on the path, but the HTTP specification does not and therefore the browser cache bucket will not ignore case. Force all URL checks for invalidation flags to be case-insensitive to the left of the querystring (if there is a querystring, otherwise for the entire URL).
- Make sure the AJAX requests’ querystring parameters are in consistent order. Changing the sequential order of parameters may be handled the same on the server but will be cached differently on the client.
- These steps are for “pull”-based XHR-driven invalidation flags being pulled from the server via XHR. For “push”-based invalidation triggered by the server, consider using something like a SignalR channel or hub to maintain an open channel of communication using WebSockets or long polling. Server application logic can then invoke this channel or hub to send an invalidation flag to the client or to all clients.
- On the client side, an invalidation flag “push” triggered in #7 above, for which #1 and #2 above would no longer apply, can still utilize #3 through #6.
You can download the project I used for this blog entry here: ClientSideCacheInvalidation.zip
Published at DZone with permission of Jon Davis, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments