Platinum Partner
java,ajax,javascript,twitter,single page interface

Don't Throw Away Your Old Java Web Framework: The SPI History of Twitter

Hear about how Twitter's conservative architectural revolution in 2012 changed the site from a client-centric SPI to a server-centric, more SEO compatible site.

Twitter.com is one of the most popular websites in the world, but few people know that is also one of the few Single Page Interface, stateless, SEO compatible websites in the world.

Twitter is an SPI in the sense that it prevents fully loading pages; each click implies a partial change of the page without loading a new page, with the necessary data obtained through AJAX (aka Unobtrusive JavaScript).

Twitter is SEO compatible because there are public pages designed to be accessed by search engine robots such as Google bots, while that same page can be viewed by users that have logged in. For example http://twitter.com/jmarranz is basically the same page, whether you are logged in or not.The key is JavaScript: when JavaScript is ignored (in the case of crawlers) the links of the page are conventional links to other pages, when JavaScript is executed the page is a SPI (and if you're logged in it is fully functional).

Page crawlers (crawlers) such as Google Search do not interpret JavaScript, and therefore they see the world as "paged"—no AJAX is used in the page loaded by the robot and the robot will not process AJAX-loaded states. This is not the case with Twitter, which provides alternative conventional pages.

It is "stateless" in the sense that Twitter ensures that servers do not have information about the status of the user page being loaded—i.e. web session data. This allows requests to arrive at any node in a server cluster without sharing sessions or needing server affinity. Looking at the AJAX requests, Twitter sends an id representing the temporary state of the user's page saying something like "previous stuff is already loaded, I want new things."

As we will see later, this SPI approach is server centric or hybrid (even though it has a lot of programming client), but Twitter has not reached the current implementation in the first attempt. There was previously an SPI client centric implementation.

First version: client-centric

The first version of the Twitter Single Page Interface used a trendy approach: client pages rendered with JavaScript based on data retrieved from the server through REST APIs. This approach is seen by many as a best practice.

We all know Twitter's REST API, which returns user activity data in JSON format. This API was very popular in alternative Twitter clients until the company introduced limitations that harmed the popularity of these readers. By then the Twitter website itself was a consumer of it's own REST API so that the browser was a real Twitter client for logged in users...

Twitter pages were mainly empty with no data on initial load, and the browser rendered the page via JavaScript requesting JSON data in successive AJAX requests. 

The title "client-centric" means that the HTML is rendered from data. Where and when HTML is rendered from server data is the big architectural part of a web application, in this case, it is the client.

In summary Twitter was an SPI website when users were logged in. For bots ignoring JavaScript and public pages, Twitter offered alternative SEO compatible web pages. At the time hashbangs #! were intensively used. Hashbangs allow "compatible SPI" links while allowing bookmarks.

Hashbangs are also SEO compatible because Google has been supporting them for many years.

Example:

When Google sees: http://twitter.com/#!jmarranz

Google will load: http://twitter.com/?_escaped_fragment_=jmarranz

To offer an SEO version of public pages, Twitter generated alternative pages for bots and SPI behavior-rendered markup in the client when JavaScript is enabled.

Second (and current) version: server-centric (or hybrid)

In early 2012 a change occurred at Twitter web engineering which apparently could be described as a conservative revolution, an apparent return to pages, to the first pre-SPI Twitter website. A retro-revolution led by Dan Webb, principal engineer in Twitter.

https://blog.twitter.com/2012/improving-performance-twittercom

At the same time one of the key developers of the client-centric SPI at Twitter left the company for a startup. 

Everything seemed to point to a return to the classic paging system, slightly spiced with some JavaScript and AJAX, but far from the radical and "ahead of its time" client-centric SPI model of the version at the time, which looked as if it was going to be thrown away almost completely.

Dan Webb seemed an avowed enemy of the Single Page Interface according to an article in his blog against hashbangs, a cornerstone idea for providing SPI, bookmarking, and SEO compatibility in any browser:

http://danwebb.net/2011/5/28/it-is-about-the-hashbangs

At the end of the Twitter blog entry, it seems to be a light and hope for SPI:

"What’s next? We’re currently rolling out this new architecture across the site. Once our pages are running on this new foundation, we will do more to further improve performance. For example, we will implement the History API to allow partial page reloads in browsers that support it, and begin to overhaul the server side of the application."

The key words are: "History API".

I myself was alarmed to read about hashbangs being under attack by the chief engineer of Twitter, one of the major drivers of the SPI, and tried to "dissuade" Dan:

https://twitter.com/jmarranz/status/174947731637403648

https://twitter.com/jmarranz/status/208876410016763904

I was crazy enough to make this proposal:

"In JSON and AJAX requests, avoid your own REST API in server, render your page chunks, and inject the markup into the page with inner HTML as much as possible"

Dan Webb's response seems to consider using some partial updates via AJAX, but not using hashbangs. The History API is proposed instead (not possible in IE 6-8 browsers).

The last tweet says:

"we made our perf decisions based on data. It's not about liking or not liking a technique. It's about what we prove is fastest."

I thought he was talking about full page loading vs pages rendered in JavaScript. My surprise was that the focus of the "new Twitter," started well before our conversation, and it was basically the same as "my proposal" (although I have always defended the hashbangs):

https://twitter.com/jmarranz/status/208927214874542080

The current server-centric (or hybrid) SPI approach of Twitter.com

At the time when the previous talks took place, the "new" Twitter.com was just being born, gradually changing the pure client centric approach based on API REST to the new more server-centric approach rendering again in server. Today Twitter.com is basically an SPI web site for a logged in user who is using a modern browser with JavaScript enabled.

The main motivation behind the new hybrid architecture was  performance:

"That architecture broke new ground by offering a number of advantages over a more traditional approach, but it lacked support for various optimizations available only on the server"

The main new features of this approach are:
  • Any publicly loaded page is initially the same for users, logged in or not (maybe bots). This ends the dual model website for SEO support.
  • It follows a Single Page Interface approach but avoiding hashbangs. Instead the History API is used. The History API is not available in older browsers with AJAX support like IE 6-8. In these minority browsers, it is accepted that user navigation is poorly paged.
  • Partial page changes are made in-server. This can dramatically reduce and simplify the necessary JavaScript code for page management. It may improve rendering speed as well as decreasing the number of requests necessary because the same AJAX request page chunks can be rendered with different sets of data. In the case of a REST API, several request would be necessary.

What is the point of this article for a Java (or general backend) developer?

Well, a lot, considering that there is a trend towards 100% client-centric applications accessing the server via REST APIs returning JSON data.

Therefore, don't throw away your old web framework, especially your template processor, maybe you're going to need it again :)

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}